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Baumeister W. 

J. Bacterid. 176:1224-1233(1994). 
[3] 

Lemaire M., Ohayon H., Gounon P., Fujino T., Beguin P. 
J. Bacterid. 177:2451-2459(1995). 


Smr 




Smr domain 


Accession number: PF01 71 3 

Definition: Smr domain 

Author: Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: [1] 

Gathering cutoffs: 0 0 

Trusted cutoffs: 1 .40 1 .40 

Noise cutoffs: -7.90 -7.90 

HIVIIVI build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 1 0431 1 72 

Reference Title: Smr: a bacterial and eukaryotic homologue 
of the C-terminal 

Reference Title: region of the MutS2 family. 
Reference Author: Moreira D, Philippe H; 
Reference Location: Trends Biochem Sci 1 999;24:298-300. 
Database Reference INTERPRO; 1PR002625; 
Comment: This family includes the Smr (Small MutS 
Related) proteins, 

Comment: and the C-terminal region of the MutS2 
protein. It has been 

Comment: suggested that this domain interacts with 
the MutSI 

Comment: Swiss: P23909 protein in the case of Smr 
proteins and with 

Comment: the N~terminai MutS related region of MutS2 

Swiss:P94545 [1]. 

Number of members: 1 4 


SRF-TF 


PDOC00302 


MADS-box domain 
signature and profiie 


A number of transcription factors contain a conserved domain of 
56 amino-acid 

residues, sometimes known as the MADS-box domain [El j. They 
are listed below: 

-Serum response factor (SRF) [1], a mammalian transcription 
factor that 

binds to the Serum Response Element (SRE). This is a short 
sequence of dyad 

symmetry located 300 bp to the 5' end of the transcription 
initiation site 

of genes such as c-fos. 

- Mammalian myocyte-specific enhancer factors 2A to 2D 
(MEF2A to MEF2D). 

These proteins are transcription factor which binds specifically 
to the 

MEF2 element present in the regulatory regions of many 
muscle-specific 
genes. 

- Drosophila myocyte-specific enhancer factor 2 (MEF2), 
-Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional 
regulator of 

mating-type-specific genes. 

- Yeast arginine metabolism regulation protein 1 (gene ARGR1 or 
ARG80)- 

- Yeast transcription factor RLM1 . 

- Yeast transcription factor SMP1 . 

- Arabidopsis thaliana agamous protein (AG) [3], a probable 
transcription 

factor involved in regulating genes that determines stamen 
and carpel 

development in wild-type flowers. Mutations in the AG gene 
result in the 

replacement of the stamens by petals and the carpels by a new 
flower. 

-Arabidopsis thaliana homeotic proteins Apetalal (API), 
ApetalaS (AP3) and 
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Pistillata (PI) which act locally to specify the identity of the 
floral 

meristem and to determine sepal and petal deveiopmerrt [4]. 
' Antirrhinum majus and tobacco homeotic protein deficiens 
(DEFA) and globosa 

(GLO) [5]. Both proteins are transcription factors involved in the 
genetic 

control of flower development. Mutations in DEFA or GLO 
cause the 

transformation of petals into sepals and of stamina into carpels. 

- Arabidopsis thaliana putative transcription factors AG LI to 
AGL6 [6]. 

- Antirrhinum majus morphogenetic protein DEF H33 (squamosa). 

In SRF, the conserved domain has been shown [1] to be involved 
in DNA-binding 

and dimerization. We have derived a pattern that spans the 
complete length of 

the domain. The profile also spans the length of the MADS-box. 

Description of pattern(s) and/or profile(s) 

Consensus pattern R-x-[RKl-x{5)-l-x-[DNGSKI-x(3)-[KRl-x(2)-T- 

[FYl-x-[RK](3)-x(2)-[LIVM]-x-K(2)-A-x-E-[LIVM]-[STA]-x-L-x(4)- 

[LIVM]-x- [UVMJ(3)-X(6)-ILIVMF]-X(2)-[FY] 

Sequences known to belong to this class detected by the pattern 

ALL. 

Other sequence(s) delected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documerrtation entry is linked to both signature patterns 
and a profile. As the prcffHe is much more sensitive than the 
patterns, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1999 / Pattern and text revised. 

References 

[11 

Norman C, Runswick M., Pollock R., Treisman R. 
Cell 55:989-1003(1988). 

[2] 

Passnriore S., Maine G.T., Eible R., Christ C, Tye B.-K. 
J. Mol. Bfol. 204:593-606(1 988). 

[3] 

Yanofsky M., Ma H., Bowman J., Drews G., Feldmann K.A., 
Meyerowitz E.M. 
Nature 346:35-39(1990). 

[4] 

Goto K., Meyerowitz E.M. 
Genes Dev. 8:1548-1560(1994). 

[5] 

Troebner W.. Ramirez L., Motte P., Hue 1., Huijser P., Loennig W.- 
E., Saedler H., Sommer H., Schwartz-Sommer Z. 
EMBO J. 1 1 :4693-4704(1 992). 

[6] 

Ma H., Yanofsky M.F., Meyerowitz E.M. 
Genes Dev. 5:484-495(1991). 

[El] 

http://transfac.gbf-braunschweig.de/cgi-bin/qt/getEntry.pl7G0014 
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SRP1 9 protein 


Accession number: PF01922 
Definition: SRP1 9 protein 
Author: Enriqht A, Ouzounis C, Bateman A 
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Alignment method of seed: Clustatw 

Source of seed members: Enright A 

Gatliering cutoffs: 25 25 

Trusted cutoffs: 31 .20 31 .20 

Noise cutoffs: -28.50 -28.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 89041 541 

Reference Title: Isolation and characterization of a cDNA 
clone encoding the 

Reference Title: 19 kDa protein of signal recognition particle 
(SRP); 

Reference Title: expression and binding to 7SL RNA. 

Reference Author: LIngelbach K, Zwieb C, Webb JR, 

Marshallsay C, Hoben PJ, 

Reference Author: Waiter P, Dobberstein B; 

Reference Location: Nucleic Acids Res 1 988; 1 6:9431 -9442. 

Reference Number: [2] 

Reference Medline: 92220168 

Reference Title: SEG65 gene product is a subunit of the 
yeast signal 

Reference Title: recognition particle required for its integrity. 
Reference Author: Hann BC, Stirling CJ, Walter P; 
Reference Location: Nature 1992;356:532-533. 
Reference Number: [3] 
Reference Medline: 922201 69 

Reference Title: The S. cerevisiae SEC65 gene encodes a 
component of yeast 

Reference Title: signaS recognition particle with homology to 
human SRP19. 

Reference Author: Stirling CJ, Hewitt EW; 
Reference Location: Nature 1 992;356:534-537. 
Database Reference INTERPRO; IPR00277a; 
Comment: The srgnal recognition particle (SRP) binds 
to the signal peptide of 

Comment; proteins as they are being translated. The 
binding of the SRP halts 

Comment: translation and the complex is then 
transported to the endoplasmic 

Comment: reticulum's cytoplasmic surface. The SRP 
then aids translocation of 

Comment: the protein through the ER membrane. The 
SRP is a ribonucleoprotein 

Comment: that is composed of a small RNA and 
several proteins. One of these 

Comment: proteins is the SRP19 protein [1] (Sec65 in 
yeast [2,3]). 

Number of members: 13 


SSB 


PDOG006O2 


Single-strand binding 
protein family signatures 


The Escherichia coli single-strand binding protein [1] (gene ssb), 
also known 

as the he! ix-destabiiizing protein, is a protein of 177 amino 
acids. It 

binds tightly, as a homotetramer, to single-stranded DNA (ss- 
DNA) and plays an 

important role in DNA replication, recombination and repair. 

Closely related variants of SSB are encoded in the genome of 
a variety of 

large self-transmissible plasmids. SSB has also been 
characterized in bacteria 

such as Proteus mirabilis or Serratia marcescens. 

Eukaryotic mitochondrial proteins that bind ss-DNA and are 
probably involved 

in mitochondrial DNA replication are structurally and evolutionary 
related to 

prokaryotic SSB. Proteins currently known to belong to this 
subfamily are 
listed below [2]. 

- Mammalian protein Mt-SSB (P16). 

- Xenopus Mt-SSBs and Mt-SSBr. 
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- Drosophila MtSSB. 
-Yeast protein RIM 1. 

We have developed two signature patterns for these proteins. 
The first is a 

conserved region in the N-terminal section oftheSSB's. The 
second is a 

centrally located region which, in Escherichia coli SSB, is 
known to be 

involved in the binding of DI^A. 
Description of pattern(s) and/or profiie(s) 

Consensus pattern [LIVMF]-[NST]-[KRHSTI-[LIVM]-x-[LIVI\4Fl{2). 
Q-[NHRK]- [LiVMA]-[GSTl-x-IDENTi 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern T-x-W-[HY]-[RNS)~[LIVM]-X"[LIVMF]-[FYl- 
[NGKR] 

Sequences known to belong to this class detected by the pattern 
A majority. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999/ Patterns and text revised. 

References 

[1] 

Meyer RR„ Ljalne P.S. 
Microbiol. Rev. 54:342-380(1990). 

12} 

Stroumbakis N.D., Li Z., Tolias P.P. 
Gene 143:171-177(1994). 


START 




START domain 


Accession number: PF01852 
Definition: START domain 
Author: SMART 
Alignment method of seed: Manual 

Source of seed members; Alignment kindly provided by SMART 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 06.20 1 06.20 

Noise cutoffs: -20.90 -20.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command tine: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99257451 

Reference Title: START: a lipid-binding domain in StAR, 
HD-ZIP and 

Reference Title: signalling proteins. 

Reference Author: Ponting CP, Aravind L; 

Reference Location : Trends Biochem Sci 1 999;24: 1 30-1 32 . 

Database reference: SMART; START; 

Database Reference INTERPRO; 1PR002913; 

Number of members: 41 


Sterol desat 




Sterol desaturase 


Accession number: PF01598 

Definition: Sterol desaturase 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_^905 (release 4.1) 

Gathering cutoffs: -13-13 

Trusted cutoffs: 1 2 .90 1 2.90 

Noise cutoffs: -44.50 -44.50 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 91 323727 

Reference Title: Cloning, disruption and sequence of the 
gene encoding yeast 

Reference Title: C-5 sterol desaturase. 

Reference Author: Arthington BA, Bennett LG, Skatrud PL, 
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Guynn CJ , Barbuch 

Reference Author: RJ, Ulbright CB, Bard M; 
Reference Location: Gene 1 991 ;1 02:39-44. 
Reference Number: [2] 
Reference Medline: 96133902 

Reference Title: Cloning and characterization of ERG25, the 
Saccharonnyces 

Reference Title: cerevislae gene encoding C-4 sterol methyl 
oxidase. 

Reference Author: Bard M, Bruner DA. Pierson CA, Lees 

ND, Biermann B, Frye L, 

Reference Author: Koegel C, Barbuch R; 

Reference Location: Proc Natl Acad Sci U S A 1996;93:1 86- 

190. 

Reference Number: [3] 
Reference Medline: 96351930 

Reference Title: Molecular characterization of the CER1 
gene of arabidopsis 

Reference Title: involved in epicuticular wax biosynthesis 
and pollen 

Reference Title: fertility. 

Reference Author: Aarts MG. Keijzer CJ, Stiekema WJ, 
PereiraA; 

Reference Location: Plant Cell 1 995;7:21 1 5-21 27. 
Database Reference INTERPRO; IPR001 541 ; 
Database reference: PFAMB; PB041 851 ; 
Comment: This family includes C-5 sterol desaturase 
and C~4 sterol methyl 

Comment: oxidase. Members of this family are 
involved in cholesterol biosynthesis 

Comment: and biosynthesis a plant cuticular wax. 
These enzymes contain many 

Comment: conserved histidlne residues. IVIembers of 

this family are integral 

Comment: mebrane proteins. 

Number of members: 34 


Sulfatase 


PDOC00117 


Sulfatases signatures 


Sulfatases (EC 3.1 .6.-) are er^ymes that hydrolyze various sulfate 
esters. The 

sequence of different types of sulfatases sire available. These 
enzymes are: 

- Arylsulfatase A (EC 3.1 .6.8) (ASA), a lysosomal enzyme which 
hydrolyzes 

cerebroside sulfate. 

- Arylsulfatase B (EC 3.1 .6.12) (ASB), a lysosomal enzyme 
which hydrolyzes 

the sulfate ester group from N-acetylgalactosamine 4-sulfate 
residues of 
dermatan sulfate. 

- Arylsulfatase C (ASD). 

- Arylsulfatase E (ASE). 

- Steryl-sulfatase (EC 3.1 .6,2) (STS) (arylsulfatase C), a 
membrane bound 

microsomal enzyme which hydrolyzes 3-beta- hydroxy steroid 
sulfates. 

- lduronate2-sulfatase precursor (EC 3.1.6.13) (IDS), a 
ysosomal enzyme 

that hydrolyzes the 2-sulfate groups from non-reducing- 
terminal iduronic 
acid residues in dermatan sulfate and heparan sulfate. 

- N-acetylgalactosamine-6-sulfatase (EC 3.1 .6.4), an enzyme 
that hydrolyzes 

the6-sulfate groups of the N-acetyi-D-galactosamine 6-sulfate 
units of 

chondroitin sulfate and the D-galactose 6-sulfate units of keratan 
sulfate. 

-Choline sulfatase (EC 3.1.6.6) (gene betC), a bacterial 
enzyme that 
converts choline-O-sulfate to choline. 

- Glucosamine-6-sulfatase (EC 3.1.6.14) (G6S), a lysosomal 
enzyme that 

hydrolyzes the N-acetyl-D-glucosamine 6-sulfate units of 
heparan sulfate 
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and keratan sulfate. 

- N-sulphoglucosamine sulphohydrolase (EC 3.10.1.1) 
(sulphamidase), the 

iysosomal enzyme that catalyzes the hydrolysis of N-suifo-d- 
glucosamine into 
glucosamine and sulfate. 

- Sea urchin embryo arylsulfatase (EC 3.1 .6.1). 

- Green alga arylsulfatase (EC 3.1.6.1), an enzyme which plays 
an important 

role in the mineralization of sulfates. 
-Arylsulfatase (EC 3.1.6.1) from Escherichia coli (gene asIA), 
Klebsiella 

aerogenes (gene atsA) and Pseudomonas aeruginosa (gene 
atsA). 

- Escherichia ooli hypothetical protein yidJ. 

It has been shown that all these sulfatases are structurally related 
[1,2,3]. 

As signature patterns for that family of enzymes we have selected 
the two best 

conserved regions. Both regions are located in the N-terminal 
section of these 

enzymes. The first region contains a conserved arginine which 
could be 

implicated In the catalytic mechanism; it Is located four residues 
after a 

position that, in eukaryotic sulfatases, is a conserved cysteine 
which has 

been shown [4] to be modified to 2-amino-3-oxopropronic acid. In 
prokaryotes, 

this cysteine is replaced by a serine. 
Description of pattern (s) and/or profile(s) 

Consensus pattern [SAP]-[UVMST]-[CSl-[STAC]-P-[STA]-R-x(2)- 
[LIVMFWI(2)- |TAR]-G [R is a putative active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern G-rA<l-x-ES7l-x(2)-[IVAS]-G-K-x(0,1)- 
[FYWMK]-[HL] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 

[1] 

Peters C, Schmidt B., Rommerskirch W., Rupp K., Zuehlsdorf M., 
Vingron M., Meyer H.E., Pohlmann R., von Figura K. 
J. Biol. Chem. 265:3374-3381 (1990). 

[2] 

Wilson P.J., Morris CP., Anson D.S., Occhiodoro T,, Bielicki J., 

Clements P.R., Hopwood J.J. 

Proc. Natl. Acad. Sci. U.S.A. 87:8531-8535(1990). 

[3] 

de Hostos EX., Schilling J., Grossman A.R. 
Mol. Gen. Genet. 218:229-239(1989). 

[4] 

Selmer T., Hallmann A., Schmidt B., Sumper M., von Figura K. 
Eur. J. Biochem. 238:341-345(1996). 


Sulfate_transp 


PDOCX)0870 


Sulfate transporters 
signature 


A number of proteins involved in the transport of sulfate across a 
membrane 

as well as some yet uncharacterized proteins have been 
shown [1 ,2] to be 

evolutionary related. These proteins are: 
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- Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1). 

- Mammalian DTDST, a probable sulfate transporter which, in 
Human, is 

involved In the genetic disease, diastrophic dysplasia (DTD). 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes 
hamata. 

- Human pendrin (gene PDS), which is involved in a number of 
hearing loss 

genetic diseases. 

- Human protein DRA (Down- Regulated in Adenoma). 

- Soybean early nodulin 70. 

- Escherichia coli hypothetical protein ychM. 

- Caenorhabditis elegans hypothetical protein F41 D9.5. 

As expected by their transport function, these proteins are highly 
hydrophobic 

and seem to contain about 12 transmembrane domains. The best 
conserved region 

seems to be located in the second transmembrane region and 
is used as a 
signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [PAVl-x~Y-[GS]-L-Y-[STAG}(2)-x(4)-[LIVFYA]- 
[LIVSTI~[YI]- x(3)-[GA]-[GST3-S-[KR] 

Sequences known to beSong to this class detected by the pattern 
ALL. 

Other s6quence(s) detected in SWISS-PROT NONE. 
Last update 

July 1999/ Pattern and text revised. 

References 

[1] 

Sandal N.N., Marcker K.A. 

Trends Biochem. Sci, 19:19-19(1994). 

[2} 

Smith F.W., Hawkesford M.J., Prosser I.M., Clarkson D.T. 
Mol. Gen. Genet. 247:709-715(1995). 
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Synuclein 


Accession number: PF01387 

Definition: Synuclein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: [1] 

Gathering cutoffs: 25 25 

Trusted cutoffs: 197.80 197.80 

Noise cutoffs: -33.80 -33.80 

HMM build command line: hmmbuild HMM SEED 

HIVIM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Mediine: 9842441 0 

Reference Title: The synuclein family. 

Reference Author: Lavedan C; 

Reference Location; Genome Res 1998;8:871-880. 

Database Reference INTERPRO; IPR001058; 

Comment: There are three types of synucleins in 

humans, these 

Comment: are called alpha, beta and gamma. Alpha 
synuclein has 

Comment: been found mutated in families with 
autosomal dominant 

Comment: Parkinson's disease. A peptide of alpha 
synuclein has 

Comment: also been found in amyloid plaques in 
Alzheimer's 

Comment: patients. 
Number of members: 1 2 
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PDOC00479 


TEA domain signature 


The TEA domain [1 ,E1] is a DNA-binding region of about 66 to 
68 amino acids 

which has been found in the N-terminat section of the 
following nuclear 
regulatory proteins: 

- Mammalian enhancer factor TEF-1. TEF-l can bind to two 
distinct sequences 

in the SV40 enhancer and is a transcriptional activator. 

- Mammalian TEF-3, TEF-4 and TEF-5 [2], putative 
transcriptional activators 

highly similar to TEF-1 . 

- Drosophila scalloped protein (gene sd), a probable 
transcription factor 

that functions in the regulation of cell-specific gene expression 
during 

Drosophila development, particularly in the differentiation of the 
nervous 
system [3]. 

- Emericella nidulans regulatory protein abaA. AbaA is 
Involved in the 

regulation of conidiation (asexual spore); its expression leads 
to the 

cessation of vegetative growth. 

- Yeast trans-acting factor TEC1 . TEC1 is involved in the 
activation of the 

Ty1 retrotransposon. 

- Caenorhabditis elegans hypothetical protein F28B12.2. 

As a signature pattern, we have used positions 39 to 67 of the 
TEA domain. 

Description of pattern(s) and/or profi!e(s) 

Consensus pattern G-R-N-E-L-I-x(2)»Y-l-x(3)-[TCI-x(3)-R-T- 
[RK] (2)-Q-[LI VM]- S-S-H-[L(VM1-Q- V 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1997 / Pattern and text revised. 

References 

[1] 

Buerglin T.R. 

Cell 66:11-12(1991). 

[2] 

Jacquemin P., Hwang J.-J., Martial J. A., Dolle P., Davidson 1. 
J. Biol. Chem. 271:21775-21785(1996). 

[3] 

Campbell S.D., Inamdar M., Rodrigues V., Raghavan V., 
Palazzolo M., Chovnick A. 
Genes Dev. 6:367-379(1992). 

http:/Aransfac.gbf-braunschweig.de/cgi-bin/qt/getEntry.pl?C0024 
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Queuine tRNA- 
ribosyftransferase 


Accession number: PF01702 

Definition: Queuine tRNA-ribosyltransf erase 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B 1 643 (release 4.1) 

Gathering cutoffs: -1 32 -1 32 

Trusted cutoffs: -1 1 0.OO -1 1 0.00 

Noise cutoffs: -1 55.40 -1 55.40 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96256303 

Reference Title: Crystal structure of tRNA-guanlne 

transqlycosylase: RNA 
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Reference Title: modification by base exchange. 
Reference Author: Romier C, Reuter K, Suck D, Ficner R; 
Reference Location: EMBO J 1996;15:2850-2857. 
Reference Number: [2] 
Reference Medline: 932871 1 6 

Reference Title: tRNA-guanine transglycosylase from 
Escherichia coli. 

Reference Title: Overexpression, purification and quaternary 
structure. 

Reference Author: Garcia GA, Koch KA, Cheng S; 
Reference Location: J Mol Biol 1993;231 :489-497. 
Database Reference: SCOP; 1 pud; fa; [SCOP-USA] [CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002616; 
Database Reference PDB; letz A; 138; 379; 
Database Reference PDB; 1enu A; 138; 379; 
Database Reference PDB; 1 pud ; 1 38; 379; 
Database Reference PDB; 1 wkd ; 1 38; 379; 
Database Reference PDB; 1wke ; 138; 379; 
Database Reference PDB; 1wkf ; 138; 379; 
Database reference: PFAMB; PB037884; 
Comment: This Is a family of queuine tRNA- 
ribosyitransferases 

Comment: EC:2.4.2.29, also known as tRNA-guanrne 
transglycosylase 

Comment: and guanine insertion enzyme. 
Comment: Queuine tRNA-ribosyltransf erase modifies 
tRNAs for asparaglne, 

Comment: aspartic acid, histidine and tyrosine with 
queuine. 

Comment: It catalyses the exchange of guanine~34 at 
the wobble position with 

Comment: 7-aminomethyl-7-dea2aguanine, and the 
addition of a cyclopentenediol 

Comment: moiety to 7-aminomethyl-7-deazaguanine- 
34 tRNA; giving a hypermodified 

Comment: base queuine in the wobble position [1 ,2]. 
Comment: The aligned region contains a zinc binding 
motif C-x-C-x2-C-x29-H, 

Comment: and important tRNA and 7-aminomethyl- 
7deazaguanine binding residues [1J. 
Number of members: 24 


Thi4 




Thi4 family 


Accession number: PF01946 

Definition: Thi4 family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 526.80 526.80 

Noise cutoffs: -1 05.00 -1 05.00 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95050223 

Reference Title; Cloning, nucleotide sequence, and 

regulation of 

Reference Title; Schizosaccharomyces pombe thi4, a 
thiamine biosynthetic 
Reference Title: gene. 

Reference Author: Zurlinden A, Schweingruber ME; 
Reference Location: J Bacteriol 1 994; 1 76:6631 -6635. 
Database Reference INTERPRO; IPR002922; 
Comment: This family Includes Swiss:P32318 a 

ni it?iti\/p thisininp hin^vnthptio 

IJUiaiiv^ 11 iicii 1 1 II 1^ L/iv/oyi lii iciit-r 

Comment: enzyme. 
Number of members: 1 4 
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ThiC family 


Accession number: PF01 964 

Definition: ThiC family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 
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Trusted cutoffs: 1047.20 1047.20 

Noise cutoffs: -338.20 -338.20 

HMM build command line: timmbtiiid -F HMM SEED 

HMM build command line: iimmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 93163063 

Reference Title: Structural genes for thiamine biosynthetic 
enzymes 

Reference Title: (thiCEFGH) in Escherichia coli K-1 2. 
Reference Author: Vander Horn PB, Backstrom AD, Stewart 
V, Begley TP; 

Reference tocation: J Bacteriol 1 993; 1 75:982-992. 

Reference Number: [2] 

Reference Medline: 9931 1 269 

Reference Title: Thiamin biosynthesis in prokaryotes. 

Reference Author: Begley TP, Downs DM, Ealick SE, 

McLafferty 1^, Van Loon AP, 

Reference Author: Taylor S, Campobasso N, Chiu HJ, 
Kinsland C, Reddick JJ, Xi 
Reference Author: J; 

Reference Location: Arch Microbiol 1 999; 1 71 :293-300. 
Reference Number: [31 
Reference Medline: 97284509 

Reference Title: Characterization of the Bacillus subtilis thiC 
operon 

Reference Title: involved in thiamine biosynthesis. 
Reference Author: Zhang Y, Taylor $V, Chiu HJ, Begley TP; 
Reference Location: J Bacteriol 1 997;1 79:3030-3035. 
Database Reference INTERPRO; IPR00281 7; 
Comment: ThiC is found within the thiamine 
biosynthesis operon. ThiC is 

Comment: involved in pyrimidine biosynthesis [2|. 
Comment: ThiC catalyzes the substitution of the 
pyrophosphate of 

Comment: 2-methyl~4-amino-5- 

hydroxymethylpyrimidine pyrophosphate by 

Comment: 4-methyl-5-(beta-hydroxyethyl)thsazole 

phosphate to yield thiamine 

Comment: phosphate [3]. 

Number of members: 1 2 


ThiJ 




ThU/Pfpl family 


Accession number: PF01965 

Definition: ThiJ/Pfpl family 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: -40.2 -40.2 

Trusted cutoffs: -40.20 -40.20 

Noise cutoffs: -47.00 -47.00 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97039868 

Reference Title: The thiJ locus and its relation to 

phosphorylation of 

Reference Title: hydroxymethylpyrimidine in Escherichia 
coii. 

Reference Author: Mizote T, Tsuda M, Nakazawa T, 
Nakayama H; 

Reference Location: Microbiology 1 996; 1 42:2969-2974. 
Reference Number: [2] 
Reference Medline; 961 961 68 

Reference Title: Sequence, expression in Escherichia coli, 
and analysis of 

Reference Title: the gene encoding a novel intracellular 
protease (Pfpl) 

Reference Title: from the hyperthermophilic archaeon 
Pyrococcus furiosus. 

Reference Author: Halio SB, Blumentals II, Short SA, Merrill 
BM, Kelly RM; 

Reference Location: J Bacteriol 1 996; 1 78:2605-261 2. 
Database Reference INTERPRO; IPR00281 8; 
Database reference; PFAMB; PB002774; 
Database reference; PFAMB; PB007213; 
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Database reference: PFAMB; PB041784; 
Comment: This family includes ThiJ a thiamine 
biosynthesis 

Comment: enzyme [1 ] that catalyses the 
phosphorylation of 

Comment: hydroxymethylpyrimidlne (HMP) to HMP 
monophosphate EC:2-7.1 .49. 

Comment: The family also includes a the protease Pfpl 

Swiss:Q51732 [2]. 

Number of members: 34 


Thr_clehydrat_C 




C-terminal domain of 
Threonine dehydratase 


Accession number: PF00585 

Definition; C-terminal domain of Threonine dehydratase 

Previous Pfam IDs: Thr_^dehydratase_C; 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Bateman A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 99.90 51 .30 

Noise cutoffs: -1.10-1.10 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 98230745 

Reference Title: Structure and control of pyridoxal 

phosphate dependent 

Reference Title: allosteric threonine deaminase. 
Reference Author: Gallagher DT, Gilliland GL, Xiao G, 
Zondio J, Fisher KE, 

Reference Author: Chinchilla D, Eisenstein E; 
Reference Location: Structure 1 998;6:465-475. 
Database Reference: SCOP; 1tdj; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR001 721 ; 

Database Reference PDB; ltd] ; 424; 512; 

Database Reference PDB; 1tdj ; 329; 419; 

Comment: -!- Threonine dehydratases PALP all contain 

a carboxy 

Comment: terminal region. This region may have a 
regulatory role. 

Comment: Some members contain two copies of this 
region. 

Number of members: 30 


thymidylat_synt 


PDOC00086 


Thymidylate synthase 
active site 


Thymidylate synthase (EC 2.1 .1 .45) [1 ,2] catalyzes the reductive 
methylation 

of dUMP to dTMP with concomitant conversion of 5,1 0- 
methylenetetrahydrofolate 

to dihydrofolate. Thymidylate synthase plays an essential role 
in DNA 

synthesis and is an important target for certain chemotherapeutic 
drugs. 

Thymidylate synthase is an enzyme of about 30 to 35 Kd in most 
species except 

in protozoan and plants where it exists as a bifunctlonal enzyme 
that includes 

a dihydrofolate reductase domain. 

A cysteine residue is involved in the catalytic mechanism (it 
covalently binds 

the 5,6-djhydro-dUMP intermediate). The sequence around the 
active site of 

this enzyme is conserved from phages to vertebrates. 
Description of pattern(s) and/or profile(s) 

Consensus pattern R-x(2)-[LIVM]-x(3)-[FW]-[QNl-x(8,9)-[LVl-x-P- 
C-[HAVM]- x(3)-[QMT]-[FYW]-x-[LV] [C is the active site residue] 
Sequences known to belong to this class detected by the pattern 
ALL 

Other sequence(s) detected in SWISS-PROT NONE. 
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Last update 

November 1997/ Pattern and text revised. 

References 

[1] 

Benkovic S.J. 

Annu. Rev. Biochem. 49:227-251(1980). 
[2] 

Ross P., O'Gara F., Condors S. 

Appl. Environ. Microbiol. 56:2156-2163(1990). 


Top6A 




Type h DNA 
topoisomerase 


Accession number: PF01962 

Definition: Type 11 DNA topoisomerase 

Author: Enriglit A, Ouzounis C, Bateman A 

Alignment metliod of seed: Clustalw 

Source of seed members: Enrlgtit A 

Gathering cutoffs: -99 -99 

Trusted cutoffs: -40.40 -40.40 

Noise cutoffs: -1 58.40 -1 58.40 

HMIVI build command line: limmbuild -F HMM SEED 

HMIW build command line: himmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97238688 

Reference Title: An atypical topoisomerase II from Archaea 
with implications 

Reference Title; for meiotic recombination (see comments] 

Reference Author: Bergerat A, de Massy B, Gadelle D, 

Varoutas PC, Nicolas A, 

Reference Author: Forterre P; 

Reference Location: Nature 1 997;386:41 4-41 7. 

Database Reference: SCOP; 1 d3y; fa; [SCOP-USA][CATH- 

PDBSUf^fl 

Database Reference INTERPRO; IPR00281 5; 
Database Reference PDB; 1d3y A; 77; 363; 
Database Reference PDB; 1 d3y B; 77; 363; 
Comment: Members of this family are the A subunit 
from type 11 DNA 

Comment: topoisomerases. Type 11 DNA 
topoisomerases catalyse the relaxation 
Comment: of DNA supercoiling by causing transient 
double strand breaks. 

Comment: The family includes topoisomerase VI 
subunit A from archaebacteria 

Comment: Swiss:Q57815 EG:5.99.1 .3 and SP011 
from yeast Swiss:P23179. 

Comment: A conserved tyrosine is thought to be 
involved in breaking the 
Comment: double stranded DNA [1}. 
Number of members: 9 


Topoisom_bac 


PDOC00333 


Prokaryotic DNA 
topoisomerase \ active 
site 


DNA topoisomerase 1 (EC 5.99.1.2) [1,2,3AE1] is oneofthe 
two types of 

enzyme that catalyze the interconversion of topological DNA 
isomers. Type 1 

topoisomerases act by catalyzing the transient breakage of DNA, 
one strand at 

a time, and the subsequent rejoining of the strands. When a 
prokaryotic type 1 

topoisomerase breaks a DNA backbone bond, it simultaneously 
forms a protein- 

DNA link where the hydroxyl group of a tyrosine residue is 
joined to a 5'- 

phosphate on DNA, at one end of the enzyme-severed DNA 
strand. 

Prokaryotic organisms, such as Escherichia coll, have two type 1 
topoisomerase 

isozymes: topoisomerase 1 (gene top A) and topoisomerase 111 
(gene topB). 

Eukaroytes also contain homologs of prokaryotic topoisomerase 
lit. 

There are a number of conserved residues in the region around 
the active site 
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tyrosine; we used this region as a signature pattern. 
Description of pattern(s) and/or profi(e(s) 

Consensus pattern [EQ]-x-L-Y-[DEQSTI-x(3,12)-[LIV]-[ST]-Y-x-R- 
[ST1-[DEQS] Uhe second Y is the active site tyrosine] 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS- PROT NONE. 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Sternglanz R. 

Curr. Opin. Cell Biol. 1 : 533-535 (1990). 
[2] 

Sharma Mondragon A. 

Curr. Opin. Struct. Biol. 5:39-47(1995). 

[3] 

Bjornstl M.-A. 

Curr. Opin. Struct. Biol. 1:99-103(1991). 
[4] 

Roca J. 

Trends Biochem. Sci. 20:156-160(1995). 
[E1] 

htlp://elItngton. pharm.arizona.edu/-bear/top/topo.html 


toxin_3 




long chain scorpion 
toxins 


Accession number: PF00537 

Definition: long chain scorpion toxins 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Arne Elofsson. 

Gathering cutoffs: 25 25 

Trusted cutoffs: 59.50 69.50 

Noise cutoffs: -3.80 -3.80 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: SCOP; 2sn3; fa; [SCOP-USA][CATH- 

PDBSUM] 

Database Reference INTERPRO; IPR002061 ; 

Comment: Scorpion toxins bind to sodium channels 

and inhibit the activation 

Comment: mechanisms of the channels, thereby 
bloc(<ing neuronal transmission. 
Number of members: 77 


Translin 




Translin family 


Accession number: PF01997 

Definition: Translin family 

Previous Pfam IDs: DUF130; 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 299.50 299.50 

Noise cutoffs: -72.40 -72.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command tine: hmmcalibrate -seed 0 HMM 

Reference Number: [1 1 

rteierence ivicoiine. y/ looc^f o 

Reference Title: Isolation and characterization of a cDNA 
encoding a 

Reference Title: Trans(in-like protein, TR/\X. 
Reference Author: Aoki K, tshtda R, Kasai M; 
Reference Location: FEBS Lett 1 997;401 :1 09-1 1 2. 
Database Reference INTERPRO; IPR002848; 
Comment: Members of this family Include Translin 
Swiss:Q1 5631 that interacts 

Comment: with DNA and forms a ring around the DNA. 
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This family also Includes 

Comment: Swiss:Q99598, that was found to interact 

with trans) in with yeast 

Connment: two-hybrid screen [1]. 

Number of members: 1 0 


Transposase_19 




Transposase 1 9 


Members of this family are capable of in vitro and/or in vivo 
insertion of a donor polynucleotide into a target polynucleotide. 
Such biological activity is useful for inserting DNA into host 
genome, for example, for cloning purposes to generate a desired 
vector in vitro. 


TRANSPOSASE (S 
30 


PDOC00801 


Transposases, 1S30 
family, signature 


Autonomous mobile genetic elements such as transposon or 
insertion sequences 

(IS) encode an enzyme, called transposase, required for excising 
and inserting 

the mobile element. On the basis of sequence similarities, 
transposases can be 

grouped into various families. One of these families has been 
shown [1 ,21 to 

consist of transposases from the following elements: 

- Is30 from Escherichia coli. 

- Is1 086 from Alcaligenes eutrophus. 

- Is 1 161 from Streptococcus salivarius. 

- Is4351 (Tn4551) from Bacteroides fragilis. 

These transposases are proteins of 340 to 380 amino acids. The 
best conserved 

region is located in their C-terminal section and is used as a 

signature 

pattern. 

Description of pattern(s) and/or profil6(s) 

Consensus pattern R-G-x(2)-E-N-x-N-G-[LIVMJ(2)-R~[QE]~ 
[LIVMFY1(2)-P-K 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

November 1995 / First entry. 

References 

[1] 

Dong Q., Sadouk A., van der Leiie D., Taghavt S., Ferhat A., 
Nuyten J.M., Borremans B., Mergeay M., Toussaint A. 
J. Bacteriol. 174:8133-8138(1992). 

[2] 

Giffard P.M., Rathsam C, Kwan E., Kwan D.W.L., Bunny K.L, 

Koo S.-P., Jacques N.A. 

J. Gen. MicrobioL 139:913-920(1993). 


Transthyretin 


PDOC00617 


Transthyretin signatures 


Transthyretin (prealbumin) [1] is a thyroid hormone-binding 
protein that seems 

to transport thyroxine (T4) from the bloodstream to the brain. It is 
a protein 

of about 130 amino acids that assembles as a homotetramer 
and forms an 

internal channel that binds thyroxine. Transthyretin is mainly 
synthesized in 

ine Dr^in cnoroiu piexus. in nurnans, varianis oi irie prui^in ctre 
associated 

with distinct forms of amyloidosis. 

The sequence of transthyretin is highly conserved in vertebrates. 
A number of 

uncharacterized proteins also belong to this family: 

- Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein vunM. 
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- Caenorhabditis elegans hypothetical protein R09H10.3. 
~ Caenorhabditis elegans hypothetical protein ZK697.8. 

We selected two regions as signature patterns. The first located 
in the N- 

terminal extremity starts with a lysine known to be involved in 
binding T4. 

The second pattern is located in the C-terminai extremity. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [KH]-[IV]-L-[DN]-x(3)-G-x-P-A-x(2)-[lV]-x-[IV] 
[The K binds thyroxine] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence{s) detected in SWISS-PROT NONE. 

Consensus pattern Y-[THl-[IVI-[AP]-x(2)-L-S-[PQ]-[FYWl-{GS]- 
[FY]-[QS] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWlSS-PROT NONE. 
Last update 

July 1 999 / Patterns and text revised. 

References 

[1] 

Schreiber G.. Richardson SJ. 

Comp. Biochem. Physiol. 1166:137-160(1997). 


TRM 




N2,N2- 

dimethylguanosine tRNA 
methyltransferase 


Accession number: PF02005 

Definition: N2,N2-dimethylguanosine tRNA 

methyltransferase 

Author; Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 664.60 664.60 

Noise cutoffs: -259.50 -259.50 

HMM build command line; hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline; 9835221 1 

Reference Title: The tRNA(guanine-26,N2-N2) 

methyltransferase (Trm1) from 

Reference Title: the hyperthermophilic archaeon Pyrococcus 
furiosus: 

Reference Title: cloning, sequencing of the gene and its 
expression in 

Reference Title: Escherichia coli. 

Reference Author: Constantinesco F, Benachenhou N, 

Motorin Y, Grosjean H; 

Reference Location: Nucleic Acids Res 1 998; 26:3753-3761 . 
Reference Number: [2] 
Reference Medline: 87260951 

Reference Title: Amino-terminal extension generated from 
an upstream AUG 

Reference Title; cod on is not required for mitochondrial 
import of yeast 

Reference Title: N2,N2-dimethylguanosine- specific tRNA 
methyltransferase. 

Reference Author: Ellis SR, Hopper AK, Martin NC; 
Reference Location; Proc Natl Acad Sci USA 1987;84:51 72- 
51 76. 

Database Reference INTERPRO; IPR002905; 

Database reference: PFAMB; PB041 661 ; 

Comment: This enzyme EC:2.1 .1 .32 used S-AdoMet to 

methylate tRNA. 

Comment: The TRM1 gene of Saccharomyces 
cerevisiae is necessary for 

Comment: the N2,N2-dimethylguanosine modification 
of both mitochondrial 

Comment: and cytoplasmic tRNAs [11. The enzyme is 
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found in both 

Comment: eukaryotes and archaebacteria [2] 
Number of members: 1 0 


tRNA__bind 




Putative tRNA binding 
domain 


Accession number: PF01588 

Definition: Putative tRNA binding domain 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B 482 (release 4. 1 ) 

Gathering cutoffs: 20 20 

Trusted cutoffs: 22.30 22.30 

Noise cutoffs: 18.20 18.20 

HMM build command line; hmmbuild -F HIVIM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97306356 

Reference Title: Human tyrosyl-tRNA synthetase shares 
amino acid sequence 

Reference Title: homology with a putative cytokine. 
Reference Author: Kleeman TA, Wei D, Simpson KL, First 
EA; 

Reference Location; J Bioi Chem 1 997;272:1 4420-1 4425. 
Reference Number: [2] 
Reference Medline: 97050848 

Reference Title: The yeast protein Arcl p binds to tRNA and 
functions as a 

Reference Title: cofactor for the methionyl-and glutamyl- 
tRNA synthetases. 

Reference Author: Stmos G, Segref A, Fasiolo F, Helimuth K, 
Shevchenko A, 

Reference Author: Mann M, Hurt EC; 
Reference Location: EMBO J 1 996;1 5:5437-5448. 
Database Reference: SCOP; Ipys; fa; [SCOP-USA3[CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR0D2547; 
Database Reference PDB; 1b70 B; 153; 247; 
Database Reference PDB; 1 b7y B; 153; 247; 
Database Reference PDB; 1eiy B; 153; 247; 
Database Reference PDB; 1 pys B; 1 53; 247; 
Database reference: PFAMB; PB01 001 5; 
Comment: This domain is found in prokaryotic 
methionyl-tRNA synthetases, 

Comment: prokaryotic phenylalanyJ tRNA synthetases 
the yeast GU4 nucleic-binding 

Comment: protein (G4p1 or p42, ARC1) [2], human 
tyrosyl-tRNA synthetase [1], 

Comment: and endotheliai-monooyte activating 
polypeptide it. 

Comment: G4p1 binds specifically to tRNA form a 
complex with methionyl-tRNA 

Comment: synthetases [2]. In human tyrosyl-tRNA 
synthetase this domain may direct 

Comment: tRNA to the active site of the enzyme [2] . 
This domain may perform a 

Comment: common function in tRNA aminoacylation 
[1]. 

Number of members: 46 


tRNA-synt„2d 


PDOC00363 


Aminoacyi-transfer RNA 
synthetases class-ll 
signatures 


Aminoacyi-tRNA synthetases (EC 6.1 .1.-) [1] are a group of 
enzymes which 

activate amino acids and transfer them to specific tRNA 
molecules as the first 

step in protein biosynthesis. In prokaryotic organisms there are 
at least 

twpntv Hifff^rpnt tvnpc* nf fiminOr^ovUtRNA A\/nth^iaR©s nnp fnr 

each different 

amino acid. In eukaryotes there are generally two aminoacyl- 
tRNA synthetases 

for each different amino acid: one cytosolic form and a 
mitochondrial form. 

While all these enzymes have a common function, they are 
widely diverse in 

terms of subunit size and of quaternary structure. 
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The synthetases specific for alanine, asparagine, aspartic acid, 
glycine, 

histidine, lysine, phenylalanine, proline, serine, and threonine are 
referred 

to as class-ll synthetases [2 to 6] and probably have a common 
folding pattern 

in their catalytic domain for the binding of ATP and amino acid 
which is 

different to the Rossmann fold observed for the class I 
synthetases [7]. 

Class-ll tRNA synthetases do not share a high degree of 
similarity, however at 

least three conserved regions are present [2,5,8]. We have 

derived signature 

patterns from two of these regions. 

Description of pattem(s) and/or profile{s) 

Consensus pattern [FYH]-R-x-[DE]-x{4,12)-[RH]-x(3)-F-x(3)-[DE] 
Sequences known to belong to this class detected by the pattern 
the majority of class-ll tRNA synthetases with the exception of 
those specific for alanine, glycine as well as bacterial histidine. 
Other sequence(s) detected in SWISS-PROT 43. 

Consensus pattern {GSTALVFHDENQHRKP}-[GSTA}-[UVMF]- 

[DE]-R-[UVMF]-x- [LIVMSTAG]-[UVMFY] 

Sequences known to belong to this class detected by the pattern 

the majority of class-H tRNA synthetases with the exception of 

those specific for serine and proline. 

Other sequence(s) detected in SWISS-PROT 161 . 

Expert(s) to contact by email 

Cusack S. cusaGk@embl-grenoble.fr 



Last update 

July 1998 / Text revised. 

References 

[1] 

Schimmel P. 

Annu. Rev. Biochem. 56:125-158(1987). 
[2] 

Delarue M., Moras D. 
BioEssays 1 5:675-687(1 993). 

13] 

Schimmel P. 

Trends Biochem. Sci. 16:1-3(1991). 
[4] 

NagelG.M., Dooitttle R.F. 

Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[51 

Cusack S., Haertlein M., Leberman R- 
Nucleic Acids Res. 19:3489-3498(1991). 

[61 

Cusack S. 

Biochimie 75:1077-1081(1993). 



trypsin 



PDOC00124 



Serine proteases, trypsin 
family, active sites 



[7] 

Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., 
Leberman R. 

Nature 347:249-255(1990). 
[8] 

Leveque F., Plateau P., Dessen P., Blanquet 8. 
Nucleic Acids Res. 18:305-312(1990). 



The catalytic activity of the serine proteases from the trypsin 
family is 

provided by a charge relay system involving an aspartic acid 
residue hydro gen- 
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s6QU6nc0s in the vicinity of the active site serine and histidino 








resiuues are 
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known to belong to the trypsin family is shown below. 








- rtcrosin. 








- diooo coagulation Tactors vii, ia, a, ai ana ah, inromDin, 








piasrninoyen, 








and protein C. 
















onynriOTrypsins. 








- oornpierneni cornponents o ir, wis, aiiu cornpierneni 








Taciors D, u ana i. 








- Complement-activating component of RA- reactive factor. 








- oyioToxic ceil proieases ^granzymes m.xo nj. 








- Duodenase 1. 








- Elastases 1, 2, 3A, 3B (protease E), leukocyte (medullasin). 








- Enterokinase (EC 3.4,21.9) (enteropeptidase). 








- Hepatocyte growth factor activator. 








- Hepsin. 








- Glandular (tissue) kallik reins (including EGF-binding protein 








Types M, D, 








and C, NGF-gamma chain, gamma-renin, prostate specific 








antigen (PSA) and 








tonin). 








- Plasma kallikrein. 








- Mast cell proteases (MCP) 1 (chymase) to 8. 








- Myeloblastin (proteinase 3) (Wegener's autoantigen). 








- Plasminogen activators (urokinase-type, and tissue-type). 








Xmnninn 1 II III nnj^ l\ / 

- Trypsins I, II, ill, and iv. 








- Tryptases. 








- Snake venom proteases such as ancrod, batroxobin. 








cerastobin, flavoxobin, 








and protein C activator. 








- Collagenase from common cattle grub and collagenoiytic 








protease from 








Atlantic sand fiddler crab. 








- Apolipoprot6in(a). 








- Blood fluke cercarial protease. 








- Drosophtia trypsin like proteases: alpha, easter, snake-locus. 








- Drosophlla protease stubble (gene sb). 








- Major mite fecal allergen Der p 111. 








All the above proteins belong to family S1 in the classification of 








peptidases 








1^,111} ana originaie irom euKaryoiic spccissi. ii snoura ue 








noted that 
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enough in the 








regions of the active site residues that they can be picked up by 








the same 








patterns. These proteases are listed below. 








- Achromobacter lyticus protease L 








- uysoDacxer aipna-iyric proiease. 








- oTrepTognsin m anu a ^oireptomyces pruit?ciot;t> m ctnu dj. 








- Streptomyces griseus glutamyl endopeptidase 11. 








- Streptomyces fradlae proteases 1 and 2. 








L^cbCi ipiiun Ui palLcil^oJ cil lU/Ul piL^lllc^o^ 








Consensus pattern [LIVM1-[ST1-A-[STAG1-H-C [H is the active site 








residue] 








Sequences known to belong to this class detected by the pattern 








ALL, except for complement components C1 r and C1s, pig 








plasminogen, bovine protein C, rodent urokinase, anorod, gyroxin 








and two insect trypsins. 








Other sequence(s) detected in SWISS-PROT 14. 
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Consensus pattern [DNSTAGC]-[GSTAPIMVQHl-x(2)-G-[DE]-S- 
G-[GS]-[SAPHV]- [UVIVIFYWH]-[LIVMFYSTANQH] [S Is the active 
site residue] 

Sequences l<nown to belong to this class detected by the pattern 
ALL, except for 18 different proteases which have lost the first 
conserved glycine. 

Other sequence(s) detected in SWISS-PROT H.tnfluenzae 
protease HAP which belongs to family S6 and 3 other proteins. 

Note if a protein includes both the serine and the histidlne active 
site signatures, the probability of it being a trypsin family serine 
protease is 1 00% 
Last update 

November 1 997 / Text revised. 

References 

[1] 

Brenner S. 

Nature 334:528-530(1 988) . 
[2] 

Rawlings N.D., Barrett A.J. 
IVIeth. Enzymol. 244:19-61 (1994). 

[El] 

http://www.expasy.ch/cgl-bin/lists7peptidas.txt 


TYA 




TYA transposon protein 


Accession number: PF01021 

Definition: TYA transposon protein 

Author; Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_90 (release 3-0) 

Gathering cutoffs: 15 15 

Trusted cutoffs: 18.00 18.00 

Noise cutoffs: 13.70 13.70 

HIVIM build command line; hmmbuild -f HMM SEED 

HMIVI build command tine: hmmcaiibrate —seed 0 HUM 

Reference Number: [1] 

Reference Mediine: 97404699 

Reference Title: Cryo-electron microscopy structure of yeast 
Ty 

Reference Title: retrotransposon virus-like particles. 
Reference Author: Palmer KJ, Tichelaar W, (y/lyers N, Burns 
NR, Butcher SJ, 

Reference Author: Kingsman AJ, Fuller SD, Saibil HR; 
Reference Location: J Viroi 1997;71 :6863-6868. 
Database Reference INTERPRO; IPR001042; 
Comment: Ty are yeast transposons. A 5.7kb 
transcript codes 

Comment: for p3 a fusion protein of TYA and TYB. 
The TYA 

Comment: protein is analogous to the gag protein of 
retroviruses. 

Comment: TYA a is cleaved to form 46kd protein which 
can form 

Comment: mature virion like particles [1]. 
Number of members: 62 


tyrosinase 


PDOCO0398 


Tyrosinase signatures 


Tyrosinase (ECl. 14.1 8.1) [1] is a copper monooxygenases that 
catalyzes the 

hydroxylation of monophenols and the oxidation of o-diphenols 
to o-quinols. 

This enzyme, found in prokaryotes as well as in eukaryotes, is 
involved in the 

formation of pigments such as metanins and other polyphenol ic 
compounds. 

Tyrosinase binds two copper ions (CuA and CuB). Each of the 
two copper ion has 

been shown [2] to be bound by three conserved histidines 
residues. The regions 

around these copper-binding ligands are well conserved and also 
shared by some 

hemocyanins, which are copper-containing oxygen carriers from 
the hemolymph of 
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many molluscs and arthropods [3,4]. 

At least two proteins related to tyrosinase are known to exist in 
mammals: 

■- TRP-1 (TYRP1) [5], which Is responsible for the conversion of 
5,6-dihydro- 

xyindole-2-carboxylic acid (DHICA) to lndole-5,6-quinone-2- 
carboxylicacid. 

TRP-2 (TYRP2) [6], which is the melanogenic enzyme 
DOPAchrome tautomerase 

(EC 5.3.3.12) that catalyzes the conversion of DOPAchrome to 
DHICA. TRP-2 

differs from tyrosinases and TRP-1 In that it binds two zinc ions 
instead 
of copper [7]. 

Other proteins that belong to this family are: 

Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze 
the oxidation 
of mono- and o-diphenols to o-diquinones [8]. 
Caenorhabditis elegans hypothetical protein C02G2.1. 

We have derived two signature patterns for tyrosinase and 
related proteins. 

The first one contains two of the histidlnes that bind CuA, and is 
located in 

the N-terminal section of tyrosinase. The second pattern contains 
a histidine 

that binds CuB, that pattern is located in the central section of the 
enzyme. 



Description of pattern(s) and/or profile(s) 

Consensus pattern H-x(4,5)-F-[LIVMFTPJ-x-[FVy|-H-R-x(2)-[LVM]- 
x(3)-E [The two H's are copper ligands] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern D-P-x-F-[LIVMFYW|-x(2)-H-x{3)-D [H is a 
copper ligand] 

Sequences known to belong to this class detected by the pattern 
ALL the tyrosinases as well as all the hemocyanins. 
Other sequence{s) detected In SWISS-PROT NONE. 
Last update 

December 1999 / Patterns and text revised. 

References 

[1] 

Lerch K. 

Prog. CItn. Biol. Res. 256:85-98(1988). 
[2] 

Jackman M.P., Hajnal A., Lerch K. 
Biochem. J. 274:707-713(1991). 

(3] 

Linzen B. 

Naturwissenschaften 76:206-211 (1989). 
[4] 

Lang W.H., van Holde K.E. 

Proc. Natl. Acad. Set. U.S.A. 88:244-248(1991). 

[5] 

Kobayashi T., Urabe K., Winder A., Jimenez-Cervantes C, 
Imokawa G., Brewington T., Solano F., Garcia-Borron J.C., 
Hearing V.J. 

EMBO J. 13:5818-5825(1994). 
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Jackson I.J., Chambers D.M., Tsukamoto K., Copeland N.G., 
Gilbert D.J., Jenkins N.A., Hearing V. 
EMBO J. 1 1 :527-535(1 992). 

[7] 

Solano F., Martinez-Liarte J.H., Jimenez-Cervantes C, Garcia- 
Borron J.C., Lozano J.A. 

Biochem. Biophys. Res. Commun. 204:1243-1250(1994). 
[8] 

Gary J.W., Lax A.R„ Rurkey W.H. 
Plant Mol. Biol. 20:245-253(1992). 


UbiA 


PDOC00727 


UbiA prenyltransferase 
family signature 


The following prenyltransf erases are evolutionary related [1,2]: 

- Bacterial 4-hydroxyben2oate octaprenyltransferase (gene ublA). 

- Yeast mitochondrial para-hydroxybenzoate- 
polyprenyitransferase (gene 

COQ2). 

- Protoheme IX farnesyltransferase (heme O synthase) from 
yeast and mamnnals 

(gene COX10) and from bacteria (genes cyoE or ctaB). 

These proteins probably contain seven transmembrane 
segments. The best 

conserved region is located in a loop between the second and 
third of these 

segments and we used it as a signature pattern. 

Description of pattern(s) and/or profiie(s) 

Consensus pattern N-x(3)-[DEH]-x(2)-[L[Mf=]'D-K(2)-|yM]~x-R- 
[ST]-x(2)-R-x(4)- G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

December 1 999 / Pattern and text revised . 

References 

[11 

Melzer M., Heide L. 

Biochim. Biophys. Acta 1212:93-102(1994). 
[2] 

Mogi T.. Saiki K., Anraku Y. 
Mol. Microbiol. 14:391-398(1994). 


Ubie_methyltran 


PDOC00911 


ubiE/COQ5 

methyltransferase family 
signatures 


The following methyltransf erases have been shown [1] to 

share regions of 

similarities: 

- Escherichia coli ubiE, which is involved in both ubiquinone and 
menaquinone 

biosynthesis and which catalyzes the S-adenosylmethlonine 
dependent 

methylation of 2-po(yprenyl-6-methoxy-1,4-t>enzoquino( into 2- 
polyprenyl~3- 

methyl-6-methoxy-1 ,4-benzoquinol and of demethylmenaquinol 
into menaquinol. 

- Yeast COQ5, a ubiquinone biosynthesis methlytransferase. 

- Bacillus subtilis spore germination protein C2 (gene: gercB or 
gerC2), a 

nrnhjahlo mcsnfini linorio KinQv/nth*^«ii<i mf^thlvtrfltisferase 

- Lactococcus lactis gerC2 homolog. 

- Caenorhabditis elegans hypothetical protein ZK652.9. 

- Leishmania donovani annastigote-specific protein A41 . 

These are hydrophllic proteins of about 30 Kd (except for ZK652.9 
which is 65 

Kd). They can be picked up in the database by the following 
patterns. 
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Description of pattern{s) and/or proflle(s) 

Consensus pattern Y-D-x-IVI'N-x(2)-[LIVM]-S-x(3)-H-x(2)-W 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern R-V-[LIVM]-K-[PV]-[GM]-G-x-[LIVMF|-x(2)- 
[LIVM]-E-x-S 

Sequences known to belong to this cleiss detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE, 
Last update 

December 1999 / Pattern and text revised. 

References 

[1] 

Lee P.T., Hsu A.Y., Ha H.T., Clarke CP. 
J. Bacteriol. 179:1748-1754(1997). 


ubiquitin 


PDOC00271 


Ubiquitin domain 
signature and profile 


Ubiquitin [1 ,2,3] is a protein of seventy six amino acid residues, 
found in 

all eukaryotic cells and whose sequence is extremely well 
conserved from 

protozoan to vertebrates, it plays a key role in a variety of 
cellular 

processes, such as ATP-dependent selective degradation of 
cellular proteins, 

maintenance of chromatin structure, regulation of gene 

expression, stress 

response and ribosome biogenesis. 

Inmost species, there are many genes coding for ubiquitin. 
However they can 

be classified into two classes. The first class produces 
polyubiquitin 

molecules consisting of exact head to tail repeats of ubiquitin. The 
number of 

repeats is variable (up to hvelve in a Xenopus gene). In the 
majority of 

polyubiquitin precursors, there is a final amino-acid after the last 
repeat. 

The second class of genes produces precursor proteins 
consisting of a single 

copy of ubiquitin fused to a C-terminal extension protein (CEP). 
There are two 

types of CEP proteins and both seem to be ribosomal proteins. 

Ubiquitin is a globular protein, the last four C-terminal residues 
(Leu-Arg- 

Gly-Gly) extending from the compact structure to form a 'tail', 
important for 

its function. The latter is mediated by the covalent conjugation of 
ubiquitin 

to target proteins, by an isopeptide linkage between the C- 
terminal glycine 

and the epsilon amino group of lysine residues in the target 
proteins. 

There are a number of proteins which are evolutionary related to 
ubiquitin: 

- Ubiquitin-Mke proteins from baculoviruses as well as in some 
strains of 

bovine viral diarrhea viruses (BVDV). These proteins are highly 
similar to 
their eukaryotic counterparts. 

- Mammalian protein GDX [4]. GDX is composed of two 
domains, a N-terminal 

ubiquitin-iike domain of 74 residues and a C-terminal domain of 
83 residues 

with some similarity with the thyroglobulin hormonogenic site. 

- Mammalian protein FAU [5]. FAU is a fusion protein which 
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consist of a 

N-terminal ubiquitin-iike protein of 74 residues fused to 
ribosomal protein 
S30. 

- Mouse protein NEDD-8 [6], a ubiquitin-like protein of 81 
residues. 

- Human protein BATS, a iarge fusion protein of 11 32 residues 
that contains a 

N-ternnlnal ubiquitin-like domain. 

- Caenorhabditis elegans protein ubl-1 [7]. Ubi-1 is a fusion 
protein which 

consist of a N-terminai ubiquitin-like protein of 70 residues 
fused to 
ribosomal protein S27A. 

- Yeast DNA repair protein RAD23 [8]. RAD23 contains a N- 
terminal domain that 

seems to be distantly, yet significantly, related to ubiquitin. 

- Mammalian RAD23-reIated proteins RAD23A and RAD23B. 
-Mammalian BCL-2 binding athanogene-1 {BAG-1). BAG-1 is 
a protein of 274 

residues that contains a central ubiquitin-like domain. 

- Human spliceosome associated protein 114 (SAP 11 4 or 
SF3A120). 

- Yeast protein DSK2, a protein involved in spindle pole body 
duplication and 

which contains a N-terminal ubiquitin-like domain. 

- Human protein CKAP1/TFCB, Schizosaccharomyces pombe 
protein alp11 and 

Caenorhabditis elegans hypothetical protein F53F4.3. These 
proteins contain 

a N-terminal ubiquitin domain and a C-terminal CAP-Gly 
domain (see 

<PDOC00660>). 

- Schizosaccharomyces pombe hypothetical protein 
SpAC26A3.16. This protein 

contains a N-terminal ubiquitin domain. 

- Yeast protein SMT3. 

- Human ubiquitin-like proteins SMT3A and SMT3B. 

- Human ubiquitin-like protein SMT3C (also known as PIC1 ; Ubil , 
Sumo-1; Gmp-1 

or Sentrin). This protein is involved in targeting ranGAPI to the 
nuclear 

pore complex protein ranBP2. 

- SMT3-like proteins in plants and Caenorhabditis elegans. 

To identify ubiquitin and related proteins we have developed a 
pattern based 

on conserved positions in the central section of the sequence. A 
profile was 

also developed that spans the complete length of the ubiquitin 
domain. 

Description of pattern(s) and/or profile(s) 

Consensus pattern K-x(2)-[LIVM]-x-[DESAK]-x{3)-[LIVM]-[PA]- 
x(3)-Q-x-[UVMl- [LlVMC]-[UVMFY]-x-G-x(4)-[DE] 
Sequences known to belong to this class detected by the pattern 
ALL, except for the RAD23 and SMT3 subfamilies, BAG-1 and 
SAP 114. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences known to belong to this class detected by the profile 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note this documentation entry is linked to both a signature pattern 
and a profile. As the profile is much more sensitive than the 
pattern, you should use it if you have access to the necessary 
software tools to do so. 
Last update 

July 1 998 / Text revised. 
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Bio/Technology 8:209-21 5(1 990). References 
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UPF0004 


PDOC00984 


Uncharacterized protein 
family UPF0004 
signature 


The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Escherichia coli hypothetical protein yliG. 
-Escherichia coli hypothetical protein yleA and HI001 9, the 
corresponding 

Haemophilus influenzae protein. 

- Bacillus subtilis hypothetical protein yqeV. 

- Helicobacter pylori hypothetical protein HP0269. 

- Helicobacter pylori hypothetical protein HP0286. 

- Mycoplasma iowae hypothetical protein in 16S RNAS'region. 

- Mycobacterium tuberculosis hypothetical protein Rv2733c. 

- Rickettsia prowazekii hypothetical protein RP41 6. 

- Rickettsia prowazekii hypothetical protein RP808. 

- Synechocystis strain PCC 6803 hypothetical protein slf0082. 

- Synechocystis strain PCC 6803 hypothetical protein sil0996. 

- Methanococcus jannaschii liypothetical protein MJ0865. 

- Methanococcus jannaschii hypothetical protein MJ0867. 

- Caenorhabditis elegans hypothetical protein F25B5.5. 

The size of these proteins range from 47 to 61 Kd. They contain 
six conserved 

cysteines, three of which are clustered In a region that can be 
used as a 
signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus oatlern fLI\/Ml-x-rLI\/MTl-x^2VG-C-x/3VC-fSTANn- 
[FY]-C-x-[UVMTI- x(4)-G 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 2. 
Last update 

December 1 999 / Pattern and text revised. 

References 

[1] 

Bairoch A. 
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Unpublished observations (1997). 


UPF0013 




Uncharacterized 
membrane protein family 
UPF0013 


Accession number: PF01 554 

Definition; Uncharacterized membrane protein family 
UPF0013 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 63 (release 4.0) 

Gathering cutoffs: -26 -26 

Trusted cutoffs: -16,10-16.10 

Noise cutoffs: -36.70 -36.70 

HMM build command line: hmmbuild -F HMM SEED 

HMI\^ build command line: hmmcalibrate --seed 0 HMM 

Database Reference: URL; http://www.expasy.ch/cgi- 

bin/lists?upflist.t>ct; 

Database Reference INTERPRO; IPR002528; 

Database reference: PFAIVIB; PB041 1 03; 

Comment: These proteins are integral membrane 

proteins of unknown 

Comment: function. 

Number of members: 47 


UPF0019 


PDOC00949 


Uncharacterized protein 
family UPF0019 
signature 


The following uncharacterized proteins have been shown [1 ,2] 

to be highly 

similar: 

- Yeast protein SNZ1 , which may be involved in growth arrest 
and cellular 

response to nutrient limitation. 
~ Yeast chromosome VI hypothetical protein YFL059w. 

- Yeast chromosome XIV hypotheticai protein YNL333w. 

- Fission yeast hypothetical protein SpAC29B12.04. 

~ Hevea brasiliensis ethylene-inducible protein HEVER. 

- Stellaria longipes hypothetical protein H47- 

- Bacillus subtil is hypothetical protein yaaD. 

- Haemophilus influenzae hypothetical protein HI1647. 

- Mycobacterium leprae hypothetical protein MICL581.12G- 

- Mycobacterium tuberculosis hypothetical protein MtCY1A10.27. 

- Archaeoglobus fulgidus hypothetical protein AF0508. 

- Methanococcus jannaschii hypothetical protein MJ0677. 

- Methanococcus vannielii hypothetical protein in tRNA/5S rRNA 
gene cluster. 

- Methanobacterlum thermoautotrophicum hypothetical protein 
Mth666. 

These are hydrophilic proteins of about 32 Kd. They can be 
picked up in the 

database by the following pattern. 
Description of pattern(s) and/or prof ile(s) 

Consensus pattern L-P-V-[VTj-[NQL]-F-[ATI-A-G-G-[UV]-A-T-P- 
A-D-A-A-[LM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 

July 1998 / Pattern and text revised. 

References 

[1] 

Sivasubramaniam S., Vanniasingham V.M., Tan C.T., Ghua N.H, 
Plant Mol. Biol. 29:173-178(1995). 

[2] 

Braun E.L., Fuge E.K., Padilla P.A., Werner-Washburne M. 
J. Bacteriol. 178:6865-6872(1996). 


UPF0047 


PDOC01018 


Uncharacterized protein 
family UPF0047 
signature 


The following uncharacterized proteins have been shown [1] 

to be highly 

simitar: 

- Bacillus subtilis hypothetical protein yugU. 
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- Escherichia coli hypothetical protein yjbQ. 

- Mycobacterium tuberculosis hypothetical protein MtCY9C4.12. 

- Synechocystis strain PCC 6803 hypothetical protein sll1 880. 

- Archaeoglobus fulgidus hypothetical protein AF2050, 

- Methanococcus jannaschii hypothetical protein MJ1081 . 

- Methanobacterium thermoautotrophicum hypothetical protein 
MTH771 . 

- Fission yeast hypothetical protein SpAC4A8.02c. 

These are small proteins of 14 to 16 Kd. They can be picked up in 
the database 

by the following pattern. This pattern is located in the C-terminal 
part of 

these proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern S-X(2)-CLlVl-x-[UVl-x(2)-G-x(4)-G-T-W-Q-x- 
[LIV] 

ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

July 1998 / First entry. 

References 

[1] 

Bairooh A. 

Unpublished observations (1998). 


UPF0052 




Uncharacterised protein 
family UPF0052 


Accession number: PF01933 

Definition: Uncharacterised protein famKy UPF0052 
Author: Enright A, Ouzounis C, Bateman A 

Mllyi Uii^i U IflcLllUU U! ctWwU. V^tUololW 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 263.90 263.90 

Noise cutoffs: -1 34.40 -1 34.40 

HMM build command line: hmmbulld -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002882; 

Number of members: 12 


UPF0057 


PDOC01013 


Uncharacterized protein 
family UPF0057 
signature 


The following uncharacterized proteins have been shown [1] to 

be evolutionary 

related: 

- Barley low-temperature induced protein bItlOI. 

- Lophorium elongatum salt-sress induced protein ES13. 

- Yeast hypothetical proteins YDL123w, YDR276c, YDR525Bw 
and YJL151C. 

- Caenorhabditis elegans hypothetical proteins F47B7.1 , 
T23F2.3, T23F2.4, 

T23F2.5 and ZK632.10. 

- Escherichia coli hypothetical protein yqaE. 

- Synechocystis strain PCC 6803 hypothetical protein ssri 1 69. 

These are small proteins of from 52 to 1 40 amino-acid resiudes 
that contains 

two transmembrane domains. As a signature pattern we 
selected a region that 

corresponds to the end of the first transmembrane helix. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LIV]-x-[STA]-[LIVF]{3)-P-P-[LIVA]-[GA]-[IV]- 
x(4)-[GKN] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 
Last update 
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July 1998 /First entry. 

References 

[11 

Rudd K.E., Humphery-Smith 1., Wasinger V.C., Bairoch A. 
Electroptioresis 1 9:536-544(1 998), 


UPF0066 


PDOC01022 


Uncharacterized protein 
family UPF0066 
signature 


The following uncharacterized proteins have been shown [1] to 

be evolutionary 

related: 

- Escherichia coli hypothetical protein yaeB and HI0510, the 
corresponding 

Haemophilus Influenzae protein. 

- Agrobacterium tumefaciens Ti plasmid protein virR. 

- Pseudomonas aeruginosa protein rcsF. 

- Archaeoglobus fuigidus hypothetical protein AF0241 . 

- Archaeoglobus fuigidus hypothetical protein AF0433. 

- MethanocoGcus jannasohii hypothetical protein MJ1 583. 

- Methanobacterium thermoautotrophicum hypothetical protein 
MTH1797. 

These are proteins of from 120 to 240 amino-acid resiudes (with 
the exception 

of AF0433 which is 366 residues long). As a signature pattern 
we selected a 

conserved region in the central part of these proteins. 

Description of pattern(s) and/or profile(s) 

Consensus pattern G-[AV]-F-[STA]-x-R-[SA]-x(2)-R-P-N 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Last update 

July 1999/ First entry. 

References 

[1] 

Bairoch A. 

Unpublished observations (1998). 


UPF0076 


PDOC00838 


Uncharacterized protein 
family UPF0076 
signature 


The following uncharacterized proteins have been shown [1] to 

share regions of 

similarities: 

- Goat antigen UK1 14, a human homolog and the rat 
corresponding protein which 

is known as perchloric acid soluble protein (PSP1). PSP1 [2] 
may inhibit an 
initiation stage of cell-free protein synthesis. 

- Mouse heat-responsive protein HRSP12. 

- Yeast chromosome V hypothetical protein YER057C. 

- Yeast chromosome IX hypothetical protein YIL051C. 

- Gaenorhabditis elegans hypothetical protein C23G10,2. 

- Escherichia coli hypothetical protein ycdK. 

- Escherichia coli hypothetical protein yhaR. 
-Escherichia coli hypothetical protein yjgF and HI071 9, tiie 
corresponding 

Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yoaB. 

- Bacillus subtil is hypothetical protein yabJ. 

- Haemophilus influenzae hypothetical protein Ht1627. 

- Helicobacter pylori liypothetical protein HP0944. 

- I-actococcus lactis aldR. 

- Myxococcus xanthus dfrA. 

- Synechocystis strain PCC 6803 hypothetical protein slr0709, 

- Rhizobium strain NGR234 symbiotic plasmid hypotheticai 
protein y4sK. 

- Pyrococcus horikoshii hypothetical protein PH0854. 

These are small proteins of around 1 5 Kd whose sequence is 
highly conserved. 

As a signature pattern, we selected a well conserved region 
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located in the C- 

terminal part of these proteins. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [PA]-[ASTPyi-R-[SACVF]-x-[LIVMFY]-x(2)- 
[GSAKR]-x-[LMVA]- x(5,8)-[LIVM]-E-[MI] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT 4. 
Last update 

July 1999 / Pattern and text revised. 

References 

[1] 

Balroch A. 

Unpublished observations (1995). 
[2) 

Oka T., Tsuji H., Noda C, Sakai K., Hong Y.-M., Suzuki L, Munoz 
S., Natori Y. 

J. Biol. Chem. 270:30060-30067(1995). 


UPF0099 




Domain of unknown 
function UPF0099 


Accession number: PF01981 

Definition: Domain of unknown function UPF0099 

Previous Pfam IDs: DUF1 1 9; 

Author: Enright A, Ouzounis C, Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Enright A 

Gathering cutoffs: 25 25 

Trusted cutoffs: 1 32.80 1 32.80 

Noise cutoffs: -35.70 -35.70 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcafibrate -seed 0 HMM 

Database Reference INTERPRO; IPR002833; 

Comment: This domain has no known function. 

Number of members: 10 


UQ_con 


PDOC00163 


Ubiquitin-conjugating 
enzymes active site 


Ubiquitin-conjugating enzymes (EG 6.3.2.19) (UBC or E2 
enzymes) [1,2,3] 

catalyze the covalent attachment of ubiquitin to target proteins. An 
activated 

ubiquitin moiety is transferred from an ubiquttin-activating enzyme 
(El) to E2 

which later ligates ubiquitin directly to substrate proteins with or 
without 

the assistance of 'N-end' recognizing proteins (E3). 

In most species there are many forms of UBC (at least 9 in 
yeast) which are 

implicated in diverse cellular functions. 

A cysteine residue is required for ubiqurtin-thiolester formation. 
There is a 

single conserved cysteine in UBC's and the region around that 
residue is 

conserved in the sequence of known UBC isozymes. We have 
used that region as 
a signature pattern. 

Description of pattern(s) and/or profile(s) 

Consensus pattern [FYWLSP]-H-[PC]-[NH]-[LIV]-x{3,4)-G-x-[LIV]- 

C-[LIV]-x- [LIV] [C Is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL, except for yeast UBC6 (DOA2). 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Jentsch S. jentsch@zmbh.unl-heldelberg.de 

Last update 
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July 1998 / Text revised. 

References 

[1] 

Jentsch S., Seufert W., SommerT., Reins H,-A. 
Trends Biocliem. Sci. 15:195-198(1990). 

[2] 

Jentscli S., Seufert W., Hauser H.-P. 
Biociiim. Biophys. Acta 1089:127-139(1991). 

[3] 

Hershko A. 

Trends Biochem. Sci. 16:265-268(1991). 


urease__gamma 


PDOC00133 


Urease signatures 


Urease (EC 3.5.1.5) is a nickel-binding enzyme ttiat catalyzes 
thie hydrolysis 

of urea to carbon dioxide and ammonia [1]. iHistorically, it was 
thie first 

enzyme to be crystallized (in 1926). It is mainly found in plant 
seeds, 

microorganisms and invertebrates. In plants, urease is a hexamer 
of Identical 

chains. In bacteria [2], it consists of either two or three different 
subunits 

(alpha, beta and gamma). 

Urease binds two nickel ions per subunit; four histidine, an 
aspartate and a 

carbamated-lysine serve as ligands to these metals; an additional 
histidine is 

involved in the catalytic mechanism [3]. 

As signatures for this enzyme, we selected a region that 
contains two 

histidine that bind one of the nickel ions and the region of the 

active site 

histidine. 

Description of pattern(s) and/or profile(s) 

Consensus pattern T-[AY]-[GA]"[GAT1-[UVM]-D-X"H-[L!VM]-H- 
x(3)-P [The two H's bind nickel] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [LIVM]{2)-[CT|-H-[HNJ-L-x(3)-tLIVM]-x(2)-D- 
[UVM]-x-F-A [H is the active site residue] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWiSS-PROT NONE. 
Last update 

November 1997 / Patterns and text revised. 

References 

[1] 

Takishima K., Suga T., Mamiya G. 
Eur. J. Biochem. 175:151-165(1988). 

[2] 

Mobley H.L.T., Husinger R.P. 
Microbiol. Rev. 53:85-108(1989). 

[3] 

Jabri E., Carr M.B., Hausinger R.P., Karplus P. A. 
Science 268:998-1004(1995). 


UreD 




UreD urease accessory 
protein 


Accession number: PF01 774 

Definition: UreD urease accessory protein 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 109 (release 4.2) 

Gathering cutoffs: 25 25 
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Trusted cutoffs: 1 86.00 1 86.00 

Noise cutoffs: -42.60 -42.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97352660 

Reference Title: Characterization of UreG, identification of a 
Reference Title: UreD-UreF-UreG complex, and evidence 
suggesting that a 

Reference Title: nucleotlde-binding site in UreG is required 
for in vivo 

Reference Title: metallocenter assembly of Klebsiella 
aerogenes urease. 

Reference Author: Moncrief MB, Hauslnger RP; 
Reference Location: J Bacteriol 1 997;1 79:4081 -4086. 
Reference Number: [2] 
Reference Medline: 96146510 

Reference Title: Organization of Ureaplasma urealyticum 
urease gene cluster 

Reference Title: and expression in a suppressor strain of 
Escherichia coll. 

Reference Author: Neyrolles O, Ferris S, Behbahani N, 
Montagnier L, Blanchard 
Reference Author: A; 

Reference Location: J Bacteriol 1996;178:647-655. 
Reference Number: [3] 
Reference Medline: 9421 1 837 

Reference Title: In vitro activation of urease apoprotein and 
role of UreD 

Reference Title: as a chaperone required for nickel 
metallocenter assembly. 

Reference Author: Park iS, Carr MB, Hausinger RP; 
Reference Location: Proc Natl Acad Sci U S A 1994;91 :3233- 
3237. 

Database Reference INTERPRO; iPR002669; 

Comment: UreD is a urease accessory protein. Urease 

urease hydrolyses 

Comment: urea into ammonia and carbamic acid [2]. 
UreD is involved in 

Comment: activation of the urease enzyme via the 
UreD-UreF-UreG-urease complex 
Comment: [1] and is required for urease nickel 
metallocenter assembly [3]. 

Comment: See also UreF UreF, UreG HypB_ UreG. 
Number of members: 23 


UreF 




UreF 


Accession number: PF01 730 

Definition: UreF 

Author: Bashton M, Bateman A 

Alignment method of seed: Clustaiw 

Source of seed members: Pfam-B_2037 (release 4. 1 ) 

Gathering cutoffs: -31 -31 

Trusted cutoffs: -1 4.30 -1 4.30 

Noise cutoffs: -49.30 -49.30 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96404789 

Reference Title; Purification and activation properties of 
UreD-UreF-urease 

Reference Title: apoprotein complexes. 
Reference Author: Moncrief MB, Hausinger RP; 
Reference Location: d Bacteriol 1 996; 1 78:541 7-5421 . 
Reference Number: [2] 
Reference Medline: 96146510 

Reference Title: Organization of Ureaplasma urealyticum 
urease gene cluster 

Reference Title: and expression in a suppressor strain of 
Escherichia coll. 

Reference Author: Neyrolles 0, Ferris S, Behbahani N, 
Montagnier L, Blanchard 
Reference Author: A; 

Reference Location: J Bacteriol 1996;178:647-655. 
Database Reference INTERPRO; IPR002639; 
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Comment: Tliis family consists of the Urease 
accessory protein 

Comment: UreF. The urease enzyme (urea 

cif 1 ituwi lyui v^i^iowi 

Comment: hydrolyses urea into ammonia and carbamic 
acid [2]. 

Comment: UreF is proposed to modulate the activation 
process of 

Comment: urease by eliminating the binding of nickel 
irons to 

Comment: noncarbamyiated protein 11]. 
Number of members: 20 


Vif 




Retroviral Vif (Virat 
infectivity) protein 


Accession number: PF00559 

Definition: Retroviral Vif (Viral infectivity) protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Swiss-Prot 

Gathering cutoffs: 25 25 

Trusted cutoffs: 53.90 53.90 

Noise cutoffs: 23.60 23.60 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 95287525 

Reference Title: Aberrant Gag protein composition of a 
human 

Reference Title: immunodeficiency virus type 1 vif mutant 
produced in 

Reference Title: primary lymphocytes. 

Reference Author: Simm M, Shahabuddin M, Chao W, Allan 

JS, Volsky DJ; 

Reference Location: J Virol 1 995;69:4582-4586. 
Database Reference INTERPRO; IPR00Q475; 

(HiV-1) Vif is required for 

Comment: productive infection of T lymphocytes and 
macrophages. Virions 

Comment: produced in the absence of Vif have 
abnormal core morphology and 

Comment: those produced in primary T cells carry immature core 
proteins 

Comment: and tow levels of mature capsid. 
Number of members: 503 


Vpu 




Vpu protein 


Accession number: PF00558 

Definition: Vpu protein 

Author: Bateman A 

Alignment method of seed: Ciustalw 

Source of seed members: Swiss-Prot 

Gathering cutoffs: 15 15 

Trusted cutoffs: 15.50 15.50 

Noise cutoffs: 1 3.60 1 3.60 

HMM build command line: hmmbuild -f HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 97479365 

Reference Title: Enhancement of retroviral production from 
packaging ceil 

Reference Title: lines expressing the human 
immunodeficiency type 1 VPU 
Reference Title: gene. 

Reference Author: Kobinger GP, Mouland AJ, Lalonde JP, 
Forget J, Cohen EA; 

Reference Location: Gene Ther 1 997;4:868-874. 
Reference Number: [2] 
Reference Medline: 951 56576 

Reference Title: The human immunodeficiency virus type 1 
Vpu protein 

Reference Title: specifically binds to the cytoplasmic domain 
of CD4: 

Reference Title: implications for the mechanism of 
degradation. 

Reference Author: Bour S, Schubert U, Strebel K; 
Reference Location: J Virol 1 995;69: 1510-1 520. 
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Reference Number: [3] 
Reference Medline: 97325981 

Reference Title: Secondary structure and tertiary fold of tlie 
human 

Reference Title: immunodeficiency virus protein U (Vpu) 

cytoplasmic domain 

Reference Title: in solution. 

Reference Author: Wiilboid D, Hoffmann S, Rosch P; 
Reference Location: Eur J Biochem 1 997; 245:581 -588. 
Database Reference: SCOP; 1 vpu; fa; [SCOP-USA][CATH- 
PDBSUM] 

Database Reference INTERPRO; IPR002094; 

Database Reference PDB; 1vpu ; 38; 81 ; 

Database reference: PFAMB; PB0033O3; 

Database reference: PFAMB; PB005882; 

Comment: The Vpu protein contains an N-termlnal 

transmembrane spanning region 

Comment: and a C-terminal cytoplasmic region. 
Comment: The HIV~1 Vpu protein stimulates virus 
production by enhancing 

Comment: the release of viral particles from infected 
cells. 

Comment: -!- The VPU protein binds specifically to 
CD4. 

Number of members: 1 94 


XPG_N 


PDOC00658 


XPG protein signatures 


Xeroderma pigmentosum p<P) [1] is a human autosomal 
recessive disease, 

characterized by a high incidence of sunlight-induced skin 
cancer. People's 

skin celts with this condition are hypersensitive to ultraviolet 
light, due 

to defects in the incision step of DNA excision repair. There are a 
minimum of 

seven genetic complementation groups involved in this pathway: 
XP-A toXP-G. 

The defect in XP-G can be corrected by a 1 33 Kd nuclear protein 
called XPG (or 
XPGC) [2]. 

XPG belongs to a family of proteins [2,3,4,5,65 that are 
composed of two 
main subsets: 

-Subset 1, to which belongs XPG, RAD2 from budding yeast 
and radi 3 from 

fission yeast. RAD2 and XPG are single-stranded DNA 
endonucleases [7,8]. 

XPG makes the 3'incision in human DNA nucleotide excision 
repair [9]. 

- Subset 2, to which belongs mouse and human FEN-1 , rad2 
from fission yeast, 

and RAD27 fronn budding yeast. FEN-1 is a structure-specific 
endonuclease. 

In addition to the proteins listed in the above groups, this 

family also 

includes: 

- Fission yeast exol . a 5'->3' double-stranded DNA exonuclease 
that could act 

in a pathway that corrects mismatched base pairs. 

- Yeast EX01 (DHS1 ), a protein with probably the same function 
as 6X0 1 . 

-Yeast DtN7. 

Sequence alignment of this family of proteins reveals that 
similarities are 

largely confined to two regions. The first is located at the N- 
terminai 

extremity (N-region) and corresponds to the first 95 to 1 05 amino 
acids. The 

second region is internal (1 -region) and found towards the C- 
terminus; it 
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spans about 140 residues and contains a highly conserved 
core of 27 amino 

acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). 
It is possible 

that the conserved acidic residues are involved in the catalytic 
mechanism of 

DNA excision repair in XPG. The amino acids linking the N- and 
I -regions are 

not conserved; indeed, they are largely absent from proteins 
belonging to the 
second subset. 

We have developed two signature patterns for these proteins. 
The first 

corresponds to the central part of the N-region, the second to part 
of the I- 

region and includes the putative catalytic core pentapeptide. 



Description of pattern(s) and/or profile{s) 

Consensus pattern [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x- 
[LVCl-K 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Consensus pattern [GS]~[UVM]-[PER]-FYS]-[LIVM]-x~A-P-x-E-A- 
[DE3-[PAS]- [QSI-fCLM] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT NONE. 

Expert(s) to contact by email 

Clarkson S.G. clarkson@medeclne.unige.ch 

Last update 

November 1 997 / Patterns and text revised. 
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Y phosphatase 


PDOC00323 


Tyrosine specific protein 
phosphatases signature 
and profiles 


Tyrosine specific protein phosphatases (EC 3.1 .3.48) (PTPase) 
[1 to 5] are 

enzymes that catalyze the removal of a phosphate group 
attached to a tyrosine 

residue. These enzymes are very important in the control of 
cell growth, 

proliferation, differentiation and transformation. Multiple forms of 
PTPase 

have been characterized and can be classified into two 
categories: soluble 

PTPases and transmembrane receptor proteins that contain 

PTPase domain (s). The 

currently known PTPases are listed below: 

Soluble PTPases. 

-PTPN1 (PTP-1B). 

- PTPN2 (T-cell PTPase; TC-PTP). 

- PTPN3 (H1) and PTPN4 (MEG), enzymes that contain an N- 
terminai band 4.1- 

like domain (see <PDOC00566>) and could act at junctions 
between the 
membrane and cytoskeleton. 

- PTPN5 (STEP). 

- PTPN6 (PTP-1C; HCP; SHP) and PTPN11 (PTP-2C; SH- 
PTP3; Syp), enzymes which 

contain two copies of the SH2 domain at its N-terminal 
extremity. The 

Drosophila protein corkscrew (gene csw) also belongs to this 
subgroup. 

- PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; 
HePTP). 

- PTPN8 (70Z-PEP). 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19)- 

- Yeast PTP1. 

- Yeast PTP2 which may be involved in the ubiquttin- 
mediated protein 

degradation pathway. 

- Fission yeast pyp1 and pyp2 which play a role in inhibiting the 
onset of 

mitosis. 

- Fission yeast pyp3 which contributes to the dephosphorylation 
of cdc2. 

- Yeast CDC14 which may be involved in chromosome 
segregation. 

- Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa californica nuclear polyhedrosis virus 19 Kd 
PTPase. 

Dual specificity PTPases. 

- DUSP1 (PTPN10; MAP kinase phosphatase-1; MKP-1); which 
dephosphorylates MAP 

kinase on both Thr-1 83 and Tyr-1 85. 

- DUSP2 (PAG-1), a nuclear enzyme that dephosphorylates 
MAP kinases ERK1 and 

ERK2 on both Thr and Tyr residues. 

- DUSP3 (VHR). 
-DUSP4 (HVH2). 

- DUSP5 (HVH3). 
-DUSP6 (Pyst1; MKP-3). 

- DUSP7 (PystS; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates MAP kinase 
FUSS. 

- Yeast YVH1. 

- Vaccinia virus H1 PTPase; a dual specificity phosphatase. 
Receptor PTPases. 
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Structurally, all known receptor PTPases, are made up of a 
variable length 

extracellular domain, followed by a transmembrane region and 
a C-terminal 

catalytic cytoplasmic domain. Some of the receptor PTPases 
contain fibronectin 

type III (FN-III) repeats, imnnunoglobulln-IIke domains, MAM 
domains or 

carbonic anhydrase-like domains In their extracellular region. The 
cytoplasmic 

region generally contains two copies of the PTPAse domain. The 
first seems to 

have enzymatic activity, while the second is inactive but seems 
to affect 

substrate specificity of the first. In these domains, the catalytic 
cysteine 

is generally conserved but some other, presumably important, 
residues are not. 

In the following table, the domain structure of known receptor 

PTPases Is 

shown: 

Extracellular Intracellular 

Ig FN-3 CAH MAM PTPase 

Leukocyte common antigen (LCA) (CD45) 0 2 0 0 2 
Leukocyte antigen related (LAR) 3 8 0 0 2 
Drosophila DLAR 3 9 0 0 2 
Drosophila DPTP 2 2 0 0 2 
PTP-alpha (LRP) 0 0 0 0 2 
PTP-beta 0 16 0 0 1 
PTP-gamma 0 110 2 
PTP-delta 0 >7 0 0 2 
PTP-epsilon 0 0 0 0 2 
PTP-kappa 14 0 1 2 
PTP-mu 14 0 1 2 
PTP-zeta 0 110 2 

PTPase domains consist of about 300 amino acids. There are 
two conserved 

cysteines, the second one has been shown to be absolutely 
required for 

activity. Furthermore, a number of conserved residues in its 
immediate 

vicinity have also been shown to be important. 

We derived a signature pattern for PTPase donnains centered on 

the active site 

cysteine. 

There are three profiles for PTPases, the first one spans the 
complete domain 

and is not specific to any subtype. The second profile is specific 
to dual- 
specificity PTPases and the third one to the PTP subfamily. 

Description of pattern (s) and/or profile(s) 

Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x- 
[LIVMFY] [C is the active site residue] 

Sequences known to belong to this class detected by the pattern 

ALL, except for nine sequences. 

Other sequence(s) detected in SWISS-PROT 3. 

Sequences known to belong to this class detected by the 1st 
profile ALL 

Other sequence(s) detected in SWISS-PROT 2. 
Sequences known to belong to this class detected by the 2nd 
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profile ALL dual type PTPases. 

Other sequence(s) detected in SWISS-PROT NONE. 

Sequences l<nown to belong to this class detected by the 3rd 

profile ALL PTP type PTPases. 

Other sequence(s) detected in SWISS-PROT NONE. 

Note the M-phase inducer phosphatases (cdc25-type 
phosphatase) are tyrosine- protein phosphatases that are not 
structurally related to the above PTPases. 

Note this documentation entry is linked to both a signature pattern 
and to profiles. As profiles are much more sensitive than the 
pattern, you should use them if you have access to the necessary 
software tools to do so. 
Last update 

July 1 999 / Text revised. 
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Zein 




Zein seed storage 
protein 


Accession number: PF01 559 

Definition: Zein seed storage protein 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members; Pfam-B_1 81 (release 4.0) 

Gathering cutoffs: -21 -21 

Trusted cutoffs: 4.60 4.60 

Noise cutoffs: -46.60 -46.60 

HMM build command line: hmmbuild -F HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 931 97294 

Reference Title: Studies of the zein-like alpha-prolamins 
based on an 

Reference Title: analysis of amino acid sequences: 
implications for their 

Reference Title: evolution and three-dimensional structure. 
Reference Author: Garratt R, Oliva G, Caracelli 1, Leite A, 
Arruda P; 

Reference Location : Proteins 1 993; 1 5: 88-99 . 
Database Reference INTERPRO; IPR002530; 
Comment: Zeins are seed storage proteins. They are 
unusually rich in 

Comment: glutamine, proline, alanine, and leucine 
residues and their 

Comment: sequences show a series of tandem repeats 
[1]- 

Number of members: 48 


2f-AN1 




AN1-ltke Zinc finger 


Accession number: PF01428 
Definition: AN1 -like Zinc finger 
Author: Bateman A, SMART 
Alignment method of seed: Manual 
Source of seed members: SMART 
Gathering cutoffs: 1616 
Trusted cutoffs: 1 6.40 1 6.40 
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Noise cutoffs: 7.30 7.30 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate --seed 0 HMM 

Reference Number: [1] 

Reference Medline: 93292985 

Reference Title: Two related localized mRNAs from 

Xenopus laevis encode 

Reference Title: ubiquitin-like fusion proteins. 

Reference Author: Linnen JM, Bailey CP, Weeks DL; 

Reference Location: Gene 1993;128:181-188. 

Database reference: SMART; ZnF AN 1; 

Database Reference INTERPRO; IPR000058; 

Comment: Zinc finger at the C~terminus of An1 

Swiss:Q91889, a ubiquitin-like 

Comment: protein in Xenopus laevis. 

Comment: The following pattern describes the zinc 

finger. 

Comment: C-X2-C-X(9-1 2)-C-X(1 -2)-C"X4-C-X2-H-X5- 
H~X-C 

Comment: Where X can be any amino acid, and 
numbers in brackets 

Comment: indicate the number of residues. 
Number of members: 1 8 


zf-B_box 


PDOC50015 


B-box zinc finger 


Accession number: PFO0643 

Definition: B-box zinc finger. 

Author: Batennan A 

Alignment method of seed: pftools 

Source of seed members: Prosite 

Gathering cutoffs: 25 25 

Trusted cutoffs: 26.00 26.00 

Noise cutoffs: 24.50 29-90 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Database Reference: SCOP; If re; fa; [SCOP-USAKCATH- 

PDBSUM] 

Database reference: PROSITE_PROFILE; PS501 19; 
Database Reference: PROSITE; PDOC5001 5 
Database Reference INTERPRO; 1PR002991 ; 
Database Reference PDB; If re ; 4; 42; 
Database reference: PFAMB; PB002777; 
Database reference: PFAMB; PB010625; 
Database reference: PFAMB; PB041771; 
Number of members: 44 


zf-CONSTANS 




CONSTANS family zinc 
finger 


Accession number: PF01 760 

Definition: CONSTANS family zinc finger 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_1 072 (release 4.2) 

Gathering cutoffs: 25 1 0 

Trusted cutoffs: 76.1 0 1 7.20 

Noise cutoffs: 9.70 9.70 

HMM build command line: hmmbuild HMM SEED 

HMM build command ilne: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 9521 1 836 

Reference Title: The CONSTANS gene of Arabidopsis 
promotes flowering and 

Reference Title: encodes a protein showing similarities to 
zinc finger 

Reference Title: transcription factors. 

Reference Author: Putterill J, Robson F, Lee K, Simon R, 

Coupland G; 

Reference Location: Cell 1995;80:847-857. 
Database Reference INTERPRO; iPR002926; 
Number of members: 45 


zf-DHHC 




DHHC zinc finger domain 


Accession number: PF01529 

Definition: DHHC zinc finger domain 

Author: Bateman A 

Alignment method of seed: Clustalw 

Source of seed members: Pfam-B_945 (release 4.0) 

Gathering cutoffs: 22 22 
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Trusted cutoffs: 22.40 22.40 

Noise cutoffs: -22.40 -22.40 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 99250263 

Reference Title: The drosophila STAM gene liomolog is in a 
tight gene 

Reference Title: cluster, and its expression correlates to that 
of the 

Reference Title: adjacent gene iai. 

Reference Author: IVIesilaty-Gross S, Reich A, I\/lotro B, 

Wides R; 

Reference Location: Gene 1 999;231 :1 73-1 86. 
Reference Number: [2] 
Reference Medline: 9731 5340 

Reference Title: Variations of the C2H2 zinc finger motif in 
the yeast 

Reference Tftte: genome and classification of yeast zinc 
finger proteins. 

Reference Author: Bohm S, Frishman D, Mewes HW; 
Reference Location: Nucleic Acids Res 1 997;25:2464-2469. 
Reference Number: [31 
Reference Medline: 99321009 

Reference Title: The DHHC domain: a new highly conserved 
cysteine-rich 

Reference Tftle: motif. 

Reference Author: Putilina T, Wong P, Gentleman S; 
Reference Location: Mol Cell Biochem 1 999;1 95:21 9-226. 
Reference Number: [4] 
Reference Medline: 1049061 6 

Reference Title: Erf2, a Novel Gene Product That Affects the 
Localization 

Reference Title: and Palmitoytation of Ras2 in 
Saccharomyces cerevisiae. 

Reference Author: Bartels DJ, Mitchell DA, Dong X, 
Deschenes RJ; 

Reference Location: Mol Cell Biol 1999;19:6775-6787. 
Database Reference INTERPRO; IPR001 594; 
Comment: This domain is also known as NEW1 [2]. 
This domain is 

Comment: predicted to be a zinc binding domain. The 
function 

Comment: of this domain is unknown, but it has been 
predicted to 

Comment: be involved in protein-protein or protein-DNA 
Comment: Interactions [3]. 
Number of members: 34 


zf-MYND 




MYND finger 


Accession number: PF01 753 

Definition: MYND finger 

Author: Bateman A 

Alignment method of seed: Manual 

Source of seed members: Bateman A 

Gathering cutoffs: 1111 

Trusted cutoffs: 1 7.30 1 7.30 

Noise cutoffs: 5.50 5.50 

HMM build command line: hmmbuild HMM SEED 

HMM build command line: hmmcalibrate -seed 0 HMM 

Reference Number: [1 ] 

Reference Medline: 962031 1 8 

Reference Title: DEAF-1 , a novel protein that binds an 
essential region in a 

Reference Title: Deformed response element. 
Reference Author: Gross CT, McGinnis W; 
Reference Location: EMBO J 1996;15:1961-1970. 
Reference Number: [2] 
Reference Medline: 98079069 

Reference Title: Molecular cloning, sequence analysis, 
expression, and 

Reference Title: tissue distribution of suppressin, a novel 
suppressor of 

Reference Title: cell cycle entry. 

Reference Author: LeBoeuf RD, Ban EM, Green MM, Stone 
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AS, Propst SIV1, Blalocl< 

Reference Autlior: JE, Tauber JD; 

Reference Location: J Biol Chem 1 998;273:361 -368. 

Database Reference INTERPRO; IPR002893; 

Number of members: 48 


ZncarbOpept 


PDOC00123 


Zinc carboxypeptidases, 
zinc-binding regions 
signatures 


There are a number of different types of zinc-dependent 
carboxypeptidases (EC 

3.4.1 7,-) [1 ,2], All these enzymes seem to be structurally and 
functionally 

related. The enzymes that belong to this family are listed below. 

- Carboxypeptidase A1 (EC 3.4.17.1), a pancreatic digestive 
enzyme that can 

removes all C-terminal amino acids with the exception of Arg, 
Lys and Pro. 

- Carboxypeptidase A2 (EC 3,4.17.15), a pancreatic digestive 
enzyme with a 

specificity similar to that of carboxypeptidase A1 , but with a 
preference 
for bulkier C-ternninal residues. 

- Carboxypeptidase B (EC 3.4.17.2), also a pancreatic digestive 
enzyme, but 

that preferentially removes C-terminal Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as argintne 
carboxypeptidase) , 

a plasma enzyme which protects the body from potent 
vasoactive and 

inflammatory peptides containing C-terminal Arg or Lys (such 
as kinins or 

anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EG 3.4.17.10) (also known as enkephalin 
convertase or 

carboxypeptidase E), an enzyme located in secretory granules 
of pancreatic 

islets, adrenal gland, pituitary and brain. This enzyme removes 
residual C- 

termlnal Arg or Lys remaining after initial endoprotease 
cleavage during 
prohormone processing. 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg 
and Lys specific 

enzyme. 

It Is ideally situated to act on peptide hormones at local tissue 
sites 

where It could control their activity before or after interaction 
with 

specific plasma membrane receptors. 

- Mast cell carboxypeptidase (EC 3.4.17.1), an enzyme with a 
specificity 

to carboxypeptidase A, but found in the secretory granules of 
mast celts. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 
3.4.17.-) [3], which 

combines the specificities of mammalian carboxypeptidases A 
and B. 

- Thermoactinomyces vulgaris carboxypeptidase T (EC 
3.4.17.18) (CPT) [4], 

which also combines the specificities of carboxypeptidases A 
and B. 

- AEBP1 [5], a transcriptional repressor active in preadipocytes. 
AEBP1 seems 

to regulate transcription by cleavage of other transcriptional 
proteins. 

- Yeast hypothetical protein YHR132C. 

All of these enzymes bind an atom of zinc. Three conserved 
residues are 

implicated in the binding of the zinc atom: two histldines and a 
glutamic acid 

We have derived two signature patterns which contain these three 
zinc-llgands. 
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Description of pattern(s) and/or profiie(s) 

Consensus pattern [PK]-x-[LI VIVIFY] -x-[LIVMFY]-x(4) -H-[STAG]-x- 
E-x-[LIVM]- [STAG]-x(6)-[LIVIVIFYTAl [H and E are zinc ligands} 
Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS-PROT Bacillus sphaericus 
endopeptidase 1 which hydrolyses the ganima-D-Glu-(L)meso- 
diaminopimellc acid bond of spore cortex peptidoglycan [6] and 
which is possibly distantly related to zinc carboxypeptidases. 

Consensus pattern H-[STAG]-x(3)-[UVME]-x(2)-[LlVMFYWl-P- 
[FYVyi [H is a zinc ligand] 

Sequences known to belong to this class detected by the pattern 
ALL. 

Other sequence(s) detected in SWISS- PROT 40. 

Note if a protein includes both signatures, the probability of it 
being a eukaryotic zinc carboxypeptidase is 1 00% 

Note these proteins belong to families M14A/I\/I14B in the 
classification of peptidases [7,E1]. 
Last update 

November 1 995 / Patterns and text revised. 
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ZZ 




Zinc finger present in 
dystrophin, CBP/p300 


Accession number: PF00569 

Definition: Zinc finger present in dystrophin, CBP/p300 
Author: SMART 
Alignment method of seed: Manual 

Source of seed members: Alignment kindly provided by SMART 

Gathering cutoffs: 14 14 

Trusted cutoffs: 1 4.60 1 4.60 

Noise cutoffs: 1 0.90 1 0.90 

HMM build command line: hmmbuild HMM SEED 

HMM build command tine: hmmcalibrate -seed 0 HMM 

Reference Number: [1] 

Reference Medline: 96402609 

Reference Title: ZZ and TAZ: new putative zinc fingers in 
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dystrophin and 

Reference Title: other proteins. 
Reference Author: Renting CP, Blake DJ, Davies KE, 
Kendrick-Jones J, Winder 
SJ; 



Trends Biochem Sci 1996;21 :1 1 -1 3. 
EXPERT; Chns.Ponting@human- 



Reference Author: 
Reference Location: 
Database Reference: 
anatomy.oxford.ac.uk; 
Database Reference INTERPRO; IPROO0433; 
Database reference: PFAMB; PB041629; 
Comment: ZZ in dystrophin binds calmodulin 

Comment: Putative zinc finger; binding not yet shown. 

Number of members: 87 
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AA. Activities of Polypeptides Comprising Signal Peptides 



Polypeptides comprising signal peptides are a family of proteins that are typically 
targeted to (1) a particular organelle or intracellular compartment, (2) interact with a 
5 particular molecule or (3) for secretion outside of a host cell. Example of polypeptides 
comprising signal peptides include, without limitation, secreted proteins, soluble proteins, 
receptors, proteins retained in the ER, etc. 



These proteins comprising signal peptides are useful to modulate ligand-receptor 
1 0 interactions, cell-to-cell communication, signal transduction, intracellular communication, 
and activities and/or chemical cascades that take part in an organism outside or within of any 
particular cell. 



One class of such proteins are soluble proteins which are transported out of the cell. 
1 5 These proteins can act as ligands that bind to receptor to trigger signal transduction or to 
permit communication between cells. 

Another class is receptor proteins which also comprise a retention domain that lodges 
the receptor protein in the membrane when the cell transports the receptor to the surface of 
2 0 the cell. Like the soluble ligands, receptors can also modulate signal transduction and 
communication between cells. 



In addition the signal peptide itself can serve as a ligand for some receptors. An 
example is the interaction of the ER targeting signal peptide with the signal recognition 
2 5 particle (SRP). Here, the SRP binds to the signal peptide, halting translation, and the 
resulting SRP complex then binds to docking proteins located on the surface of the ER, 
prompting transfer of the protein into the ER. 



30 



A description of signal peptide residue composition is described below in Subsection 

IV.C.l. 
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III* Methods of Modulating Polypeptide Production 

It is contemplated that polynucleotides of the invention can be incorporated into a 
host cell or in-vitro system to modulate polypeptide production* For instance, the SDFs 
prepared as described herein can be used to prepare expression cassettes useful in a number of 
5 techniques for suppressing or enhancing expression. 

An example are polynucleotides comprising sequences to be transcribed, such as 
coding sequences, of the present invention can be inserted into nucleic acid constructs to 
modulate polypeptide production. Typically, such sequences to be transcribed are 
heterologous to at least one element of the nucleic acid construct to generate a chimeric gene 
10 or construct. 

Another example of useful polynucleotides are nucleic acid molecules comprising 
regulatory sequences of the present invention. Chimeric genes or constructs can be generated 
when the regulatory sequences of the invention linked to heterologous sequences in a vector 
construct. Within the scope of invention are such chimeric gene and/or constructs. 
1 5 Also within the scope of the invention are nucleic acid molecules, whereof at least a part 

or fragment of these DNA molecules are presented in TABLE 1 of the present application, and 
wherein the coding sequence is under the control of its own promoter and/or its own regulatory 
elements. Such molecules are useful for transforming the genome of a host cell or an organism 
regenerated from said host cell for modulating polypeptide production. 
2 0 Additionally, a vector capable of producing the oligonucleotide can be inserted into the 

host cell to deliver the oligonucleotide. 

More detailed description of components to be included in vector constructs are 
described both above and below. 

Whether the chimeric vectors or native nucleic acids are utilized, such 
2 5 polynucleotides can be incorporated into a host cell to modulate polypeptide production. 

Native genes and/or nucleic acid molecules can be effective when exogenous to the host cell. 

Methods of modulating polypeptide expression includes, without limitation: 

Suppression methods, such as 



Antisense 



30 



Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 
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as well as Methods for Enhancing Production, such as 
Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 



5 



IILA* Suppression 



Expression cassettes of the invention can be used to suppress expression of 
endogenous genes which comprise the SDF sequence. Inhibiting expression can be useful, 
for instance, to tailor the ripening characteristics of a fruit (Oeller et aL, Science 254 :437 
(1991)) or to influence seed size„(WO98/07842) or to provoke cell ablation (Mariani et aL, 
1 0 Nature 357: 384-387 (1992). 

As described above, a number of methods can be used to inhibit gene expression in 
plants, such as antisense, ribozyme, introduction of exogenous genes into a host cell, 
insertion of a polynucleotide sequence into the coding sequence and/or the promoter of the 
endogenous gene of interest, and the like. 

15 IIL A, 1 . Antisense 



plant to produce an antisense strand of RNA. For plant cells, antisense RNA inhibits gene 
expression by preventing the accumulation of mRNA which encodes the enzyme of interest, see^ 
e.g., Sheehy et aL, Proc. Nat. Acad. Set USA, 85:8805 (1988), and Hiatt et al., U.S. Patent No. 



IIL A.2. Ribozymes 

Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA 
and down- regulate translation. 



to be suppressed. Introduction of expression cassettes in which a nucleic acid is configured in 
the sense orientation with respect to the promoter has been shown to prevent the accumulation of 
mRNA. A detailed description of this method is described above. 



An expression cassette as described above can be transformed into host cell or 



2 0 4,801,340. 



25 



IILA.3. Co-Suppression 

Another method of suppression is by introducing an exogenous copy of the gene 



IILA.4, 



Insertion of Sequences into the Gene to be Modulated 
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Yet another means of suppressing gene expression is to insert a polynucleotide 
into the gene of interest to disrupt transcription or translation of the gene. 

Homologous recombination could be used to target a polynucleotide insert to a 
gene using the Cre-Lox system (A.C, Vergunst et al., Nucleic Acids Res, 26:2729 (1998), A.C. 
5 Vergunst et al., Plant MoL Biol 38:393 (1998), H. Albert et aL, Plant J, 2:649 (1995)). 

In addition, random insertion of polynucleotides into a host cell genome can also 
be used to disrupt the gene of interest, Azpiroz-Leehan et al., Trends in Genetics 13:152 (1997). 
In this method, screening for clones from a library containing random insertions is preferred for 
identifying those that have polynucleotides inserted into the gene of interest. Such screening can 
10 be performed using probes and/or primers described above based on sequences from TABLE 1, 
fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or any transgenic plants having a desired phenotype. 

III.A,5. Regulatory SequenceModulation 

The SDFs described in Table 1, and fragments thereof are examples of 
1 5 nucleotides of the invention that contain regulatory sequences that can be used to suppress or 
inactivate transcription and/or translation from a gene of interest as discussed in LC.5. 



20 is often helpful to express a gene comprising a dominant negative mutation. Production of 
protein variants produced from genes comprising dominant negative mutations is a useful 
tool for research Genes comprising dominant negative mutations can produce a variant 
polypeptide which is capable of competing with the native polypeptide, but which does not 
produce the native result. Consequently, over expression of genes comprising these mutations 

2 5 can titrate out an undesired activity of the native protein. For example. The product from a 

gene comprising a dominant negative mutation of a receptor can be used to constitutively 
activate or suppress a signal transduction cascade, allowing examination of the phenotype 
and thus the trait(s) controlled by that receptor and pathway. Altematively, the protein arising 
from the gene comprising a dominant-negative mutation can be an inactive enzyme still capable 

3 0 of binding to the same substrate as the native protein and therefore competes with such native 

protein. 



III. A.6. Genes Comprising Dominant-Negative Mutations 

When suppression of production of the endogenous, native protein is desired it 
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Products from genes comprising dominant-negative mutations can also act upon 
the native protein itself to prevent activity. For example, the native protein may be active only 
as a homo-multimer or as one subunit of a hetero-multimer. Incorporation of an inactive subunit 
into the multimer with native subunit(s) can inhibit activity; 
5 Thus, gene function can be modulated in host cells of interest by insertion into 

these cells vector constmcts comprising a gene comprising a dominant-negative mutation. 

III.B. Enhanced Expression 

Enhanced expression of a gene of interest in a host cell can be accomplished by either 
(1) insertion of an exogenous gene; or (2) promoter modulation. 

10 IILB.l. Insertion of an Exogenous Gene 

Insertion of an expression constract encoding an exogenous gene can boost the 
number of gene copies expressed in a host cell. 

Such expression constructs can comprise genes that either encode the native 
protein that is of interest or that encode a variant that exhibits enhanced activity as compared to 
1 5 the native protein. Such genes encoding proteins of interest can be constructed from the 
sequences from TABLE 1, fragments thereof, and substantially similar sequence thereto. 

Such an exogenous gene can include either a constitutive promoter permitting 
expression in any cell in a host organism or a promoter that directs transcription only in 
particular cells or times during a host cell life cycle or in response to environmental stimuli. 



2 0 III.B.2. Regulatory Sequence Modulation 

The SDFs of Table 1, and fragments thereof, contain regulatory sequences that 
can be used to enhance expression of a gene of interest. For example, some of these sequences 
contain useful enhancer elements. In some cases, duplication of enhancer elements or insertion 
of exogenous enhancer elements will increase expression of a desired gene from a particular 

2 5 promoter. As other examples, all 11 promoters require binding of a regulatory protein to be 

activated, while some promoters may need a protein that signals a promoter binding protein to 
expose a polymerase binding site. In either case, over-production of such proteins can be used 
to enhance expression of a gene of interest by increasing the activation time of the promoter. 
Such regulatory proteins are encoded by some of the sequences in TABLE 1, 

3 0 fragments thereof, and substantially similar sequences thereto. 

Coding sequences for these proteins can be constructed as described above. 
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IV. Gene Constructs and Vector Construction 

To use isolated SDFs of the present invention or a combination of them or parts and/or 
mutants and/or fusions of said SDFs in the above techniques, recombinant DNA vectors which 
5 comprise said SDFs and are suitable for transformation of cells, such as plant cells, are usually 
prepared. The SDF construct can be made using standard recombinant DNA techniques 
(Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation {e.g., particle gun 
bombardment) as referenced below. 
1 0 The vector backbone can be any of those typical in the art such as plasmids, viruses, 

artificial chromosomes, BACs, YACs and PACs and vectors of the sort described by 

(a) BAC: Shizuya et al., Proc. Natl. Acad. Sci. USA 89: 8794-8797 (1992); 
Hamilton et al., Proc. Natl. Acad. Sci. USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al.. Science 236:806-812 (1987);. 

1 5 (c) FAC: Sternberg N. et aL, Proc Natl Acad Sci USA. Jan;87(l): 103-7 (1990); 

(d) Bacteria-Yeast Shuttle Vectors: Bradshaw et aL, Nucl Acids Res 23: 4850- 
4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g,, 
Frischauf et al., J. Mol Biol 170: 827-842 (1983); or Insertion vector, e.g., 

2 0 Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, VoLl Oxford: IRL 
Press (1985); 

(f) T-DNA gene fusion vectors :Walden et aL, Mol Cell Biol 1: 175-194 (1990); 
and 

(g) Plasmid vectors: Sambrook et al., infra. 

2 5 Typically, a vector will comprise the exogenous gene, which in its tum comprises an 

SDF of the present invention to be introduced into the genome of a host cell, and which gene 
may be an antisense construct, a ribozyme construct chimeraplast, or a coding sequence with 
any desired transcriptional and/or translational regulatory sequences, such as promoters, UTRs, 
and 3' end termination sequences. Vectors of the invention can also include origins of 

3 0 replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc, 

A DNA sequence coding for the desired polypeptide, for example a cDNA sequence 
encoding a full length protein, will preferably be combined with transcriptional and translational 
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initiation regulatory sequences which will direct the transcription of the sequence from the gene 
in the intended tissues of the transformed plant. 

For example, for over-expression, a plant promoter fragment may be employed that will 
direct transcription of the gene in all tissues of a regenerated plant. Altematively, the plant 
5 promoter may direct transcription of an SDF of the invention in a specific tissue (tissue-specific 
promoters) or may be otherwise under more precise environmental control (inducible 
promoters). 

If proper polypeptide productionis desired, a polyadenylation region at the 3 -end of the 
coding region is typically included. The polyadenylation region can be derived from the natural 
1 0 gene, from a variety of other plant genes, or from T-DNA. 

The vector comprising the sequences from genes or SDF or the invention may 
comprise a marker gene that confers a selectable phenotype on plant cells. The vector can 
include promoter and coding sequence, for instance. For example, the marker may encode 
biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, 
1 5 bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or 
phosphinotricin. 

IV.A. Coding Sequences 

Generally, the sequence in the transformation vector and to be introduced into 
the genome of the host cell does not need to be absolutely identical to an SDF of the present 
2 0 invention. Also, it is not necessary for it to be full length, relative to either the primary 

transcription product or fully processed mRNA. Furthermore, the introduced sequence need not 
have the same intron or exon pattem as a native gene. Also, heterologous non-coding segments 
can be incorporated into the coding sequence without changing the desired amino acid sequence 
of the polypeptide to be produced. 

2 5 IV.B. Promoters 

As explained above, introducing an exogenous SDF from the same species or an 
orthologous SDF from another species can modulate the expression of a native gene 
corresponding to that SDF of interest. Such an SDF construct can be under the control of 
either a constitutive promoter or a highly regulated inducible promoter (e,g,, a copper 

3 0 inducible promoter). The promoter of interest can initially be either endogenous or 

heterologous to the species in question. When re-introduced into the genome of said species, 
such promoter becomes exogenous to said species. Over-expression of an SDF transgene can 
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lead to co-suppression of the homologous endogeneous sequence thereby creating some 
alterations in the phenotypes of the transformed species as demonstrated by similar analysis 
of the chalcone synthase gene (Napoli et al.. Plant Cell 2:279 (1990) and van der Krol et al., 
Plant Cell 2:291 (1990)). If an SDF is found to encode a protein with desirable 
5 characteristics, its over-production can be controlled so that its accumulation can be 
manipulated in an organ- or tissue-specific manner utilizing a promoter having such 
specificity. 

Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to 
be tissue-specific or developmentally regulated, such a promoter can be utilized to drive or 
1 0 facilitate the transcription of a specific gene of interest (e.g., seed storage protein or root- 
specific protein). Thus, the level of accumulation of a particular protein can be manipulated 
or its spatial localization in an organ- or tissue- specific manner can be altered. 



IV. C Signal Peptides 

1 5 SDFs of the present invention containing signal peptides are indicated in Table 1. In 

some cases it may be desirable for the protein encoded by an introduced exogenous or 
orthologous SDF to be targeted (1) to a particular organelle intracellular compartment, (2) to 
interact with a particular molecule such as a membrane molecule or (3) for secretion outside 
of the cell harboring the introduced SDF. This will be accomplished using a signal peptide. 

2 0 Signal peptides direct protein targeting, are involved in ligand-receptor interactions 

and act in cell to cell communication. Many proteins, especially soluble proteins, contain a 
signal peptide that targets the protein to one of several different intracellular compartments. 
In plants, these compartments include, but are not limited to, the endoplasmic reticulum (ER), 
mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein 

2 5 storage vessicles (PSV) and, in general, membranes. Some signal peptide sequences are 

conserved, such as the Asn-Pro-Ile-Arg amino acid motif found in the N-terminal propeptide 
signal that targets proteins to the vacuole (Marty (1999) The Plant Cell 11: 587-599), Other 
signal peptides do not have a consensus sequence per se^ but are largely composed of 
hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale 

3 0 and Denecke (1999) The Plant Cell 11: 615-628). Still others do not appear to contain either 

a consensus sequence or an identified common secondary sequence, for instance the 
chloroplast stromal targeting signal peptides (Keegstra and Cline (1999) The Plant Cell 11: 
557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an 
organelle and then to a membrane within the organelle (e.g. within the thylakoid lumen of the 
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chloroplast; see Keegstra and Cline (1999) The Plant Cell 11: 557-570). In addition to the 
diversity in sequence and secondary structure, placement of the signal peptide is also varied. 
Proteins destined for the vacuole, for example, have targeting signal peptides found at the N- 
terminus, at the C-terminus and at a surface location in mature, folded proteins. Signal 
5 peptides also serve as ligands for some receptors. 

These characteristics of signal proteins can be used to more tightly control the 
phenotypic expression of introduced SDFs, In particular, associating the appropriate signal 
sequence with a specific SDF can allow sequestering of the protein in specific organelles 
(plastids, as an example), secretion outside of the cell, targeting interaction with particular 
1 0 receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs of the 
invention increases the range of manipulation of SDF phenotypic expression. The nucleotide 
sequence of the signal peptide can be isolated from characterized genes using common 
molecular biological techniques or can be synthesized in vitro. 

In addition, the native signal peptide sequences, both amino acid and nucleotide, 
described in Table 1 can be used to modulate polypeptide transport. Further variants of the 
native signal peptides described in Table 1 are contemplated. Insertions, deletions, or 
substitutions can be made. Such variants will retain at least one of the functions of the native 
signal peptide as well as exhibiting some degree of sequence identity to the native sequence. 

Also, fragments of the signal peptides of the invention are useful and can be fused with 
other signal peptides of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

15 A wide range of techniques for inserting exogenous polynucleotides are known for a 

number of host cells, including, without limitation, bacterial, yeast, mammalian, insect and plant 
cells. 

Techniques for transforming a wide variety of higher plant species are well known and 
described in the technical and scientific literature. See, e.g, Weising et aL.Ann. Rev, Genet, 
2 0 22:421 (1988); and Christou, Euphytica, v, 85, n.l-3:13-27, (1995). 

DNA constructs of the invention may be introduced into the genome of the desired plant 
host by a variety of conventional techniques. For example, the DNA construct may be 
introduced directly into the genomic DNA of the plant cell using techniques such as 
electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be 
2 5 introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. 
Alternatively, the DNA constmcts may be combined with suitable T-DNA flanking regions and 
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introduced into a convontiomlAgrobacterium tumefaciens host vector. The virulence functions 
of th& Agrobacterium tumefaciens host will direct the insertion of the construct and adjacent 
marker into the plant cell DNA v^hen the cell is infected by the bacteria (McCormac et aL, Mol 
Biotechnol 8:199 (1997); Hamilton, Gene 200:107 (1997)); Salomon et aL EMBOJ, 2:141 
5 (1984); Herrera-Estrella et al. EMBOJ, 2:987 (1983). 

Microinjection techniques are known in the art and well described in the scientific and 
patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is 
described in Paszkowski et al. EMBO J. 3:2717 (1984). Electroporation techniques are 
described in Fromm et al. Proc. Natl Acad, ScL USA 82:5824 (1985). Ballistic transformation 

1 0 techniques are described in Klein et al. Nature 327 :773 (1987). Agrobacterium 

/wme/ac/ens-mediated transformation techniques, including disarming and use of binary or co- 
integrate vectors, are well described in the scientific literature. See, for example Hamilton, CM,, 
Gene 200:107 (1997); MuUer et al. Mol, Gen, Genet, 207:171 (1987); Komari et al. Plant J, 
10:165 (1996); Venkateswarlu et al. Biotechnology 9:1103 (1991) and GltdiVt,AP., Plant Mol 

1 5 Biol 20:1203 (1992); Graves and Goldman, Plant Mol Biol 7:34 (1986) and Gould et al,, Plant 
Physiology 95:426 (1991). 

Transformed plant cells which are derived by any of the above transformation 
techniques can be cultured to regenerate a whole plant that possesses the transformed genotype 
and thus the desired phenotype such as seedlessness. Such regeneration techniques rely on 

2 0 manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a 
biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described in Evans et al„. Protoplasts 
Isolation and Culture in Handbook of Plant Cell Culture/' pp. 124-176, MacMillan Publishing 
Company, New York, 1983; and Bindings Regeneration ofPlants^ Plant Protoplasts, pp* 21-73, 

2 5 CRC Press, Boca Raton, 1988. Regeneration can also be obtained from plant callus, explants, 
organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. Ann, 
Rev, of Plant Phys, 38:467 (1987). Regeneration of monocots (rice) is described by Hosoyama 
et al. (Biosci, Biotechnol Biochem, 58:1500 (1994)) and by Ghosh et al. (J. Biotechnol 32:1 
(1994)). The nucleic acids of the invention can be used to confer desired traits on essentially any 



Thus, the invention has use over a broad range of plants, including species from the 
genera Anacardiuny Arachis, Asparagus^ Atropa^ Avena, Brassica, Citrus, Citrullus, Capsicum, 
CarthamuSf Cocas, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, 
Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Linum, Lolium,Lupinus, 



30 



plant. 
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Lycopersicon, Malus, Manihot, Majorana, Medicago, NicotianUy Olea, Oryza^ Panieum, 
Pannesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus, Secale, 
SeneciOy Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, 
and, Zea, 



transgenic plants and confirmed to be operable, it can be introduced into other plants by 
sexual crossing. Any of a number of standard breeding techniques can be used, depending 
upon the species to be crossed. 

The particular sequences of SDFs identified are provided in the attached TABLE 1. 
10 One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, synthetic 
DNA fragments or polypeptides constituting desired sequences by recombinant methodology 
known in the art or described herein, 

EXAMPLES 

The invention is illustrated by way of the following examples. The invention is not 
1 5 limited by these examples as the scope of the invention is defined solely by the claims 
following. 

EXAMPLE 1: cDNA PREPARATION 

A number of the nucleotide sequences disclosed in TABLE 1 herein as representative of 
the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA 

2 0 from com plants grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred 
International, Inc., Supply Management, P.O. Box 256, Johnston, Iowa 50131-0256. 

A number of the nucleotide sequences disclosed in TABLE 1 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from 
Arabidopsis thaliana^ Wassilewskija ecotype or by sequencing cDNA obtained from mRNA 

2 5 from such plants as described below. This is a true breeding strain. Seeds of the plant are 
available from the Arabidopsis Biological Resource Center at the Ohio State University, 
under the accession number CS2360. Seeds of this plant were deposited under the terms and 
conditions of the Budapest Treaty at the American Type Culture Collection, Manassas, VA 
on August 31, 1999, and were assigned ATCC No. PTA-595. 

30 Other methods for cloning full-length cDNA are described, for example, by Seki et 

al., Plant Journal 15:707-720 (1998) High-efficiency cloning of Arabidopsis full-length 
cDNA by biotinylated Cap trapper"; Maruyama et al.. Gene 12S:171 (1994) Oligo-capping a 



5 



One of skill will recognize that after the expression cassette is stably incorporated in 
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simple method to replace the cap stmcture of eukaryotic mRNAs with oligoribonucleotides''; 
and WO 96/34981. 

Tissues were, or each organ was, individually pulverized and frozen in liquid 
nitrogen. Next, the samples were homogenized in the presence of detergents and then 
5 centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by 
treatment with detergents and proteinase K followed by ethanol precipitation and 
centrifugation. The polysomal RNA from the different tissues was pooled according to the 
1 0 following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods 
described below. 



with sequences presented in TABLE 1 was poly(A)-containing polysomal mRNAs from 
1 5 inflorescences and root tissues of corn plants grown from HYBRID SEED # 35A19. Male 
inflorescences and female (pre-and post-fertilization) inflorescences were isolated at various 
stages of development. Selection for poly(A) containing polysomal RNA was done using 
oligo d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: 
A Practical Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The quality and the 
2 0 integrity of the polyA+ RNAs were evaluated. 



clones with sequences presented in TABLE 1 was polysomal RNA isolated from the top- 
most inflorescence tissues of Arabidopsis thaliana Wassilewskija (Ws.) and from roots of 

2 5 Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the Arabidopsis 

Biological Resource Center. Nine parts inflorescence to every part root was used, as 
measured by wet mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the 
sample was homogenized in the presence of detergents and then centrifuged. The debris and 
nuclei were removed from the sample and more detergents were added to the sample. The 

3 0 sample was centrifuged and the debris was removed and the sample was applied to a 2M 

sucrose cushion to isolate polysomal RNA. Cox et al., Plant Molecular Biology: A Practical 
Approach", pp. 1-35, Shaw ed., c. 1988 by IRL, Oxford. The polysomal RNA was used 
for cDNA synthesis by the methods described below. Polysomal mRNA was then isolated as 
described above for corn cDNA. The quality of the RNA was assessed electrophoretically. 



Starting material for cDNA synthesis for the exemplary corn cDNA clones 



Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA 
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Following preparation of the mRNAs from various tissues as described above, selection 
of mRNA with intact 5' ends and specific attachment of an oligonucleotide tag to the 5' end of 
such mRNA was performed using either a chemical or enzymatic approach. Both techniques 
take advantage of the presence of the cap" structure, which characterizes the 5' end of most 
5 intact mRNAs and which comprises a guanosine generally methylated once, at the 7 position. 

The chemical modification approach involves the optional elimination of the 2\ 3'-cis 
diol of the 3' terminal ribose, the oxidation of the 2% 3'-cis did of the ribose linked to the cap of 
the 5' ends of the mRNAs into a dialdehyde, and the coupling of the such obtained dialdehyde to 
a derivatized oligonucleotide tag. Further detail regarding the chemical approaches for 
1 0 obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
W096/34981 published November 7, 1996. 

The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of 
mRNAs involves the removal of the phosphate groups present on the 5' ends of uncapped 
incomplete mRNAs, the subsequent decapping of mRNAs having intact 5' ends and the ligation 
15 of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide tag. Further 
detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are 
disclosed in Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University, Le clonage des 
ADNc complets: difficultes et perspectives nouvelles. Apports pour Tetude de la regulation de 
I'expression de la tryptophane hydroxylase de rat, 20 Dec. 1993), EPO 625572 and Kato etaL, 
2 0 Gene 15Q:243-250 (1994). 

In both the chemical and the enzymatic approach, the oligonucleotide tag has a 
restriction enzyme site (e.g. an EcoRI site) therein to facilitate later cloning procedures. 
Following attachment of the oligonucleotide tag to the mRNA, the integrity of the mRNA is 
examined by performing a Northern blot using a probe complementary to the oligonucleotide 



For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic 
method, first strand cDNA synthesis is performed using an oligo-dT primer with reverse 
transcriptase. This oligo-dT primer can contain an internal tag of at least 4 nucleotides, which 
can be different from one mRNA preparation to another. Methylated dCTP is used for cDNA 
3 0 first strand synthesis to protect the intemal EcoRI sites from digestion during subsequent steps. 
The first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline 
hydrolysis to eliminate residual primers. 

Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow 
fragment and a primer corresponding to the 5' end of the ligated oligonucleotide. The primer is 



25 



tag. 
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typically 20-25 bases in length. Methylated dCTP is used for second strand synthesis in order to 
protect internal EcoRI sites in the cDNA from digestion during the cloning process. 

Following second strand synthesis, the full-length cDNAs are cloned into a phagemid 
vector, such as pBlueScript™ (Stratagene). The ends of the full-length cDNAs are blunted with 
5 T4 DNA polymerase (Biolabs) and the cDNA is digested with EcoRI. Since methylated dCTP 
is used during cDNA synthesis, the EcoRI site present in the tag is the only hemi-methylated 
site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate 
subcloning, an Hind III adapter is added to the 3' end of full-length cDNAs. 



length cDNAs are then directionally cloned either into pBlueScript™ using either the EcoRI and 
Smal restriction sites or, when the Hind III adapter is present in the full-length cDNAs, the 
EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by 
electroporation, into bacteria, which are then propagated under appropriate antibiotic selection. 



The plasmid cDNA libraries made as described above are purified (e.g. by a column 
available from Qiagen). A positive selection of the tagged clones is performed as follows. 
Briefly, in this selection procedure, the plasmid DNA is converted to single stranded DNA using 
2 0 phage Fl gene II endonuclease in combination with an exonuclease (Chang et al.. Gene 127 :95 
(1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al, Biotechniques 12: 124 (1992). 
Here the single stranded DNA is hybridized with a biotinylated oligonucleotide having a 
sequence corresponding to the 3' end of the oligonucleotide tag. Preferably, the primer has a 

2 5 length of 20-25 bases. Clones including a sequence complementary to the biotinylated 

oligonucleotide are selected by incubation vdth streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the 
magnetic beads and converted into double stranded DNA using a DNA polymerase such as 
ThermoSequenase'^'^ (obtained from Amersham Pharmacia Biotech). Alternatively, protocols 

3 0 such as the Gene Trapper"^" kit (Gibco BRL) can be used. The double stranded DNA is then 

transformed, preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot 
analysis. 



10 



The full-length cDNAs are then size fractionated using either exclusion chromatography 
(AcA, Biosepra) or electrophoretic separation which yields 3 to 6 different fractions. The full- 



15 



Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as 



follows. 
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Following transformation, the libraries are ordered in microtiter plates and sequenced. 
The Arabidopsis library was deposited at the American Type Culture Collection on January 
7, 2000 as E-coli liba 010600'' under the accession number PTA-1161 . 
EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 
5 The SDFs of the invention can be used in Southern hybridizations as described above. 

The following describes extraction of DNA from nuclei of plant cells, digestion of the 
nuclear DNA and separation by length, transfer of the separated fragments to membranes, 
preparation of probes for hybridization, hybridization and detection of the hybridized probe. 
The procedures described herein can be used to isolate related polynucleotides or for 

1 0 diagnostic purposes. Moderate stringency hybridization conditions, as defined above, are 
described in the present example. These conditions result in detection of hybridization 
between sequences having at least 70% sequence identity. As described above, the 
hybridization and wash conditions can be changed to reflect the desired percenatge of 
sequence identity between probe and target sequences that can be detected, 

15 In the following procedure, a probe for hybridization is produced from two PGR 

reactions using two primers from genomic sequence oi Arabidopsis thaliana. As described 
above, the particular template for generating the probe can be any desired template. 

The first PGR product is assessed to validate the size of the primer to assure it is of 
the expected size. Then the product of the first PGR is used as a template, with the same pair 

2 0 of primers used in the first PGR, in a second PGR that produces a labeled product used as the 
probe. 

Fragments detected by hybridization, or other bands of interest, can be isolated from 
gels used to separate genomic DNA fragments by known methods for further purification 
and/or characterization. 



2 5 Buffers for nuclear DNA extraction 
1. lOXHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma 
S-2501) 


10 mM spemiine 


3.5 g 


Stabilize chromatin and the nuclear membrane 
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0.1 M EDTA 
(disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 M Tris 


12.1 g 


Buffer 


0.8 M KCl 


59.6 g 


Adjusts ionic strength for stability of nuclei 



Adjust pH to 9.5 with ION NaOH. It appears that there is a nuclease present in 
leaves. Use of pH 9.5 appears to inactivate this nuclease. 

2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50''C. Add the sucrose slowly then 
bring the mixture to close to final volume; stir constantly until it has dissolved. Bring 
the solution to volume. 

3. Sarkosyl solution (lyses nuclear membranes) 



N-lauroyl sarcosine (Sarkosyl) 
0.1 M Tris 

0.04 M EDTA (Disodium) 



1000 ml 

20.0 g 

12.1 g 



14.9 g 



Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper 
volume. 



4. 20% Triton X-100 
80 ml Triton X-100 
320 ml IxHB (w/o P-ME and PMSF) 
Prepare in advance; Triton takes some time to dissolve 



A. 
1. 



Procedure 

Prepare IX H" buffer (keep ice-cold during use) 
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1000 ml 

lOX HB 100 ml 

2 M sucrose 250 ml a non-ionic osmoticum 

Water 634 ml 



Added just before use: 



100 mM PMSF* 



6-mercaptoethanol 



10 ml a protease inhibitor; protects 
nuclear membrane proteins 
1 ml inactivates nuclease by reducing 
disulfide bonds 



10 



*100 mM PMSF 

(phenyl methyl sulfonyl fluoride, Sigma P-7626) 
(add 0,0875 g to 5 ml 100% ethanol) 



15 



Homogenize the tissue in a blender (use 300-400 ml of IxHB per blender). Be sure 
that you use 5-10 ml of HB buffer per gram of tissue* Blenders generate heat so be 
sure to keep the homogenate cold. It is necessary to put the blenders in ice 
periodically. 



3. Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 
20 min. This lyses plastid, but not nuclear, membranes. 

4, Filter the tissue suspension through several nylon filters into an ice-cold beaker. The 
20 first filtration is through a 250-micron membrane; the second is through an 85 -micron 

membrane; the third is through a 50-micron membrane; and the fourth is through a 
20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped up 
by gently squeezing the liquid through the filters. 



5. 



Centrifuge the filtrate at 1200 x g for 20 min. at 4''C to pellet the nuclei. 
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6. Discard the dark green supernatant. The pellet will have several layers to it. One is 
starch; it is white and gritty. The nuclei are gray and soft. In the early steps, there 
may be a dark green and somewhat viscous layer of chloroplasts. 

Wash the pellets in about 25 ml cold H buffer (with Triton X-100) and resuspend by 
swirling gently and pipetting. After the pellets are resuspended. 



Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant. 



Repeat the wash 3-4 times until the supernatant has changed from a dark green to a 
pale green. This usually happens after 3 or 4 resuspensions. At this point, the pellet 
is typically grayish white and very slippery. The Triton X-100 in these repeated steps 
1 0 helps to destroy the chloroplasts and mitochondria that contaminate the prep. 

Resuspend the nuclei for a final time in a total of 15 ml of H buffer and transfer the 
suspension to a sterile 125 ml Erlenmeyer flask. 

7. Add 15 ml, dropwise, cold 2% Sarkosyl, 0.1 M Tris, 0.04 M EDTA solution (pH 9.5) 
while swirling gently. This lyses the nuclei. The solution will become very viscous. 

15 8. Add 30 grams of CsCl and gently swirl at room temperature until the CsCl is in 
solution. The mixture will be gray, white and viscous. 

9. Centrifuge the solution at 11,400 x g at 4*^C for at least 30 min. The longer this spin 
is, the firmer the protein pellicle. 



2 0 10. The result is typically a clear green supernatant over a white pellet, and (perhaps) 

under a protein pellicle. Carefully remove the solution under the protein pellicle and 
above the pellet. Determine the density of the solution by weighing 1 ml of solution 
and add CsCl if necessary to bring to 1.57 g/ml. The solution contains dissolved 
solids (sucrose etc) and the refractive index alone will not be an accurate guide to 

2 5 CsCl concentration. 
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11. Add 20 \xl of 10 mg/ml EtBr per ml of solution. 

12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer 
pipette and discard. Carefully remove the DNA band with another transfer pipette. 
The DNA band is usually visible in room light; otherwise, use a long wave UV light 
to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once 
the solution is clear, extract at least two more times to ensure that all of the EtBr is 
gone. Be very gentle, as it is very easy to shear the DNA at this step. This extraction 
may take a while because the DNA solution tends to be very viscous. If the solution 
is too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of 
TE (10 mM Tris, ImM EDTA, pH 8) to remove the cesium chloride. 

16. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a 
lot of debris, centrifuge the DNA solution at least at 2500 x g for 10 min. and 
carefully transfer the clear supernatant to a new tube. Read the A260 concentration of 



17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the 
DNA. Load 50 ng and 100 ng (based on the OD reading) and compare it with known 
and good quality DNA. Undigested lambda DNA and a lambda-Hindlll-digested 
DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol : 

1. The relative amounts of DNA for different crop plants that provide approximately a 
balanced number of genome equivalent is given in Table 3. Note that due to the size 



the DNA. 
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5 

3. 

10 4. 
5. 

15 6. 
7. 

8. 

20 



of the wheat genome, wheat DNA will be underrepresented. Lambda DNA provides 
a useful control for complete digestion. 

Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20 °C for at 
least two hours. Yeast DNA can be purchased and made up at the necessary 
concentration, therefore no precipitation is necessary for yeast DNA. 

Centrifuge the solution at 11,400 x g for 20 min. Decant the ethanol carefully (be 
careful not to disturb the pellet). Be sure that the residual ethanol is completely 
removed either by vacuum desiccation or by carefully wiping the sides of the tubes 
with a clean tissue. 

Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully 
resuspended before proceeding to the next step. This may take about 30 min. 

Add the appropriate volume of lOX reaction buffer provided by the manufacturer of 
the restriction enzyme to the resuspended DNA followed by the appropriate volume 
of enzymes. Be sure to mix it properly by slowly swirling the tubes. 

Set-up the lambda digestion-control for each DNA that you are digesting. 

Incubate both the experimental and lambda digests overnight at 37"C. Spin down 
condensation in a microfuge before proceeding. 

After digestion, add 2 \x\ of loading dye (typically 0.25% bromophenol blue, 0.25% 
xylene cyanol in 15% FicoU or 30% glycerol) to the lambda-control digests and load 
in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 mM EDTA, pH 8). If the 
lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in 
-20°C for at least 2 hours (preferably overnight). 
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EXCEPTION: Arahidopsis and yeast DNA are digested in an appropriate volume; 
they don't have to be precipitated. 



10. Resuspend the DNA in an appropriate volume of TE (e.g., 22 \i\ x 50 blots = 1100 |li1) 
and an appropriate volume of lOX loading dye (e.g., 2.4 jiil x 50 blots = 120 Be 
5 careful in pipetting the loading dye - it is viscous. Be sure you are pipetting the 

correct volume. 



Table 3 

Some guide points in digesting genomic DNA. 



Species 


Genome 
Size 


Size Relative to 
Arabidopsis 


Genome 
Equivalent to 2 
1^ Arabidopsis 
DNA 


Amount 
of DNA 
per blot 


Arabidopsis 


120 Mb 


IX 


IX 


2|Lig 


Brassica 


1,100 Mb 


9.2X 


0.54X 


10 fig 


Com 


2,800 Mb 


23.3X 


0.43X 


20 fig 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20|Lig 


Oat 


11,300 Mb 


94X 


O.llX 


20 fig 


Rice 


400 Mb 


3.3X 


0.75X 


5 Hg 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 fig 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 fig 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


10 fig 


Wheat 


16,000 Mb 


133X 


0.08X 


20 fig 


Yeast 


15 Mb 


0.12X 


IX 


0.25 ^g 



Protocol for Southern Blot Analysis 

The digested DNA samples are electrophoresed in 1% agarose gels in Ix TPE buffer. 
Low voltage; overnight separations are preferred. The gels are stained with EtBr and 
photographed. 



15 
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1. For blotting the gels, first incubate the gel in 0.25 N HCl (with gentle shaking) for 
about 15 min. 



2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate 
(with shaking) in 0.5 M NaOH in 1.5 M NaCl for 15 min. 

5 3. The gel is then briefly rinsed in water and neutralized by incubating twice (with 
shaking) in 1.5 M Tris pH 7,5 in 1.5 M NaCl for 15 min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X 

SSC for at least 15 min. before use. (20x SSC is 175.3 g NaCl, 88.2 g sodium citrate 
per liter, adjusted to pH 7.0.) 

10 5, The nylon membrane is placed on top of the gel and all bubbles in between are 
removed. The DNA is blotted from the gel to the membrane using an absorbent 
medium, such as paper toweling and 6x SCC buffer. After the transfer, the membrane 
may be lightly brushed with a gloved hand to remove any agarose sticking to the 
surface. 

15 6, The DNA is then fixed to the membrane by UV crosslinking and baking at 80°C. The 
membrane is stored at 4^C until use. 



B. Protocol for PGR Amplification of Genomic Fragments in Arabidopsis 
Amplification procedures : 



1. Mix the following in a 0.20 ml PGR tube or 96-well PGR plate: 



Volume 


Stock 


Final Amount or Cone. 


0.5 [d 


~ 10 ng/|jil genomic DNA^ 


5ng 


2.5^1 


lOX PGR buffer 


20 mM Tris, 50 mM KCl 



^ Arabidopsis DNA is used in the present experiment, but the procedure is a general one. 
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0.75 


50 mM MgClz 


1.5 mM 


1 ^1 


10 pmol/^1 Primer 1 (Forward) 


10 pmol 


1 ^l 


10 pmol/|j,l Primer 2 (Reverse) 


10 pmol 


0.5 ^1 


5 mM dNTPs 


0.1 mM 


0.1 ^1 


5 units/^1 Platinum Taq™ (Life 
Technologies, Gaithersburg, MD) 
DNA Polymerase 


1 units 


(to 25 ^il) 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 



1) 94°C for 1 0 min. followed by 



2) 




3) 




4^ 


5 cycles: 




5 cycles: 




25 cycles: 


94 - 


30 sec 


94 "C- 


30 sec 


94 °C - 30 sec 


62 ^C- 


30 sec 


58 ^'C- 


30 sec 


53 ''C- 30 sec 


72 ^C- 


3 min 


72 


3 min 


72 °C - 3 min 



5) 72°C for 7 min. Then the reactions are stopped by chilling to 4"C. 
The procedure can be adapted to a multi-well format if necessary. 
Quantification and Dilution of PCR Products: 

1. The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A 

linearized plasmid DNA can be used as a quantification standard (usually at 50, 100, 
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200, and 400 ng). These will be used as references to approximate the amount of 
PGR products. Hindlll-digested Lambda DNA is useful as a molecular weight 
marker. The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is 
examined to determine that the size of the PGR products is consistent with the 
5 expected size and if there are significant extra bands or smeary products in the PGR 

reactions. 

2. The amounts of PGR products can be estimated on the basis of the plasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of 
DNA from bands with the correct size can be isolated by dipping a sterile 10-|il tip 

1 0 into the band while viewing though a UV Transilluminator. The small amount of 

agarose gel (with the DNA fragment) is used in the labeling reaction. 

C. Protocol for PCR-DIG-Labeling of DNA 

Solutions : 

Reagents in PGR reactions (diluted PGR products, lOX PGR Buffer, 50 mM MgGl2, 5 
1 5 U/|Lil Platinum Taq Polymerase, and the primers) 

lOX dNTP + DIG-ll-dUTP [1:5]: (2 mM dATP, 2 mM dGTP, 2 mM dOTP, 1.65 
mM dTTP, 0.35 mM DIG-ll-dUTP) 

lOX dNTP + DIG-ll-dUTP [1:10]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, L81 
mM dTTP, 0.19 mM DIG-ll-dUTP) 

2 0 lOX dNTP + DIG-ll-dUTP [1:15]: (2 mM dATP, 2 mM dGTP, 2 mM dGTP, 1.875 

mM dTTP, 0.125 mM DIG-1 1-dUTP) 

TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 



25 



Maleate buffer: In 700 ml of deionized distilled water, dissolve 11.61 g maleic acid 
and 8.77 g NaGl. Add NaOH to adjust the pH to 7.5, Bring the volume to 1 L. Stir 
for 15 min. and sterilize. 
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10% blocking solution: In 80 ml deionized distilled water, dissolve l,16g maleic 
acid. Next, add NaOH to adjust the pH to 7,5. Add 10 g of the blocking reagent 
powder (Boehringer Mannheim, Indianapolis, IN, Cat. no. 1096176). Heat to 60 C 
while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir 
and sterilize. 

1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCl, 50 mM MgCla, pH9.5). Prepared from 
autoclaved solutions of IM Tris pH 9.5, 5 M NaCl, and 1 M MgCl2 in autoclaved 
distilled water. 
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Procedure : 

1. PCR reactions are performed in 25 volumes containing: 



PGR buffer 
MgCl2 

lOX dNTP + DIG-ll-dUTP 
Platinum Taq^" Polymerase 
10 pg probe DNA 
10 pmol primer 1 



IX 

1.5 mM 

IX (please see the note below) 
1 unit 



10 



Note: 



lOX dNTP + DIG-ll-dUTP (1 :5) 



lOX dNTP + DIG-ll-dUTP (1:10) 
lOX dNTP + DIG-ll-dUTP (1:15) 



Use for : 
<lkb 



1 kb to 1.8 kb 
> 1.8 kb 



The PCR reaction uses the following amplification cycles: 



1) 


94''C for 10 min. 


















5 cycles: 




5 cycles: 




25 cycles: 


95°C - 


30 sec 


95°C - 


30 sec 


95°C -30 sec 


61°C - 


1 min 


59°C - 


1 min 


51°C - 1 min 


73°C - 


5 min 


75^C - 


5 min 


73''C - 5 min 



15 



5) for 8 min. The reactions are terminated by chilling to 4''C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an 
aliquot of the unlabelled probe starting material. 



4. The amount of DIG-labeled probe is determined as follows: 
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Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris 
and 1 mM EDTA, pH 8) as shown in the following table: 



DIG-Iabeled control 
DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution 
Name) 


5 ng/|Lil 


1 |ul m 49 ^1 TE 


100 pg/^il (A) 


100pg/^l(A) 


25 jal in 25 yX TE 


50pg/^l(B) 


50 pg/nl (B) 


25 nl in 25 |al TE 


25pg/nl(C) 


25 pg/|al (C) 


20 p,l in 30 nl TE 


10pg/^l(D) 



Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg 
are spotted onto a positively charged nylon membrane, marking the membrane 
lightly with a pencil to identify each dilution. 

Serial dilutions (e.g., 1:50, 1:2500, 1:10,000) of the newly labeled DNA probe 
are spotted. 



10 



c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then 
incubated in 1% blocking solution for 15 min at room temp. 



The labeled DNA is then detected using alkaline phosphatase conjugated anti- 
DIG antibody (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) and 
an NBT substrate according to the manufacture's instruction. 



f. 

15 



Spot intensities of the control and experimental dilutions are then compared to 
estimate the concentration of the PCR-DIG-labeled probe. 
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D. Prehybridization and Hybridization of Southern Blots 

Solutions : 



100% Formamide 



purchased from Gibco 



20X SSC 
per L: 



(IX = 0.15 M NaCl, 0.015 M Nascitrate) 
175 g NaCl 

87.5 g Na3citrate-2H20 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 



10 



10% Blocking Reagent: In 80 ml deionized distilled water, dissolve 1.16 g maleic 

acid. Next, add NaOH to adjust the pH to 7.5, Add 10 g of the blocking reagent 
powder. Heat to 60^C while stirring to dissolve the powder. Adjust the volume 
to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 



Final 

Concentration 


Components 


Volume 
(per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures : 

15 1. Place the blot in a heat-sealable plastic bag and add an appropriate volume of 

prehybridization solution (30 ml/lOOcm^) at room temperature. Seal the bag with a 
heat sealer, avoiding bubbles as much as possible. Lay down the bags in a large 
plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are 
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lying flat in the tray so that the prehybridization solution is evenly distributed 
throughout the bag. Incubate the blot for at least 2 hours with gentle agitation using a 
waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 98°C using the PGR 
5 machine and immediately cool it to A'^C. 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and 
mix well but avoid foaming. Bubbles may lead to background. 

4. Pour off the prehybridization solution from the hybridization bags and add new 
prehybridization and probe solution mixture to the bags containing the membrane. 

10 5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SDS at 60°C. 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash solution 
per membrane. 

15 To avoid background keep the membranes fully submerged to avoid drying in spots; 

agitate sufficiently to avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 

E. Procedure for Immunological Detection with CSPD 

Solutions : 

2 0 Buffer 1: Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCl; 

adjusted to pH 7.5 with NaoH) 



Washing buffer: 



Maleic acid buffer with 0.3% (v/v) Tween 20. 
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Blocking stock solution 



10% blocking reagent in buffer 1. Dissolve (lOX 
concentration): blocking reagent powder (Boehringer 
Mannheim, Indianapolis, IN, cat. no. 1096176) by 
constantly stirring on a 65°C heating block or heat in a 
microwave, autoclave and store at ^'C. 



Buffer 2 

(IX blocking solution): 



Dilute the stock solution 1:10 in Buffer 1. 



Detection buffer: 



0.1 M Tris, 0.1 M NaCl, pH 9.5 



Procedure : 

10 1. After the post-hybridization wash the blots are briefly rinsed (1-5 min.) in the maleate 
washing buffer with gentle shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 w^ith gentle shaking. 



15 



Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN, cat. no, 1093274) 
at 75 mU/ml (1:10,000) in Buffer 2 is used for detection. 75 ml of solution can be 
used for 3 blots. 



4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twice in washing buffer with gentle shaking. About 250 
mis is used per wash for 3 blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 



20 7. Dilute CSPD (1:200) in detection buffer^ (This can be prepared ahead of time and 
stored in the dark at 4"*^. 



The following steps must be done individually. Bags (one for detection and one for 
exposure) are generally cut and ready before doing the following steps. 
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5 



8. 



The blot is carefully removed from the detection buffer and excess liquid removed 
without drying the membrane. The blot is immediately placed in a bag and 1.5 ml of 
CSPD solution is added. The CSPD solution can be spread over the membrane. 
Bubbles present at the edge and on the surface of the blot are typically removed by 
gentle rubbing. The membrane is incubated for 5 min. in CSPD solution. 



9. 



Excess liquid is removed and the membrane is blotted briefly (DNA side up) on 
Whatman 3MM paper. Do not let the membrane dry completely. 



10. 



Seal the damp membrane in a hybridization bag and incubate for 10 min at SV'^C to 



enhance the luminescent reaction. 



10 11. Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be 
taken. Luminescence continues for at least 24 hours and signal intensity increases 
during the first hours. 

Example 3; Transformation of Carrot Cells 



15 described above. Similarly, a number of plant genera can be regenerated from tissue culture 
following transformation. Transformation and regeneration of carrot cells as described herein 
is illustrative. 

Single cell suspension cultures of carrot {Daucus carota) cells are established from 
hypocotyls of cultivar Early Nantes in B5 growth medium (O.L. Gamborg et al.. Plant 
2 0 Physiol 45:372 (1970)) plus 2,4-D and 15 mM CaCl2 (B5 -44 medium) by methods known in 
the art. The suspension cultures are subcultured by adding 10 ml of the suspension culture to 
40 ml of B5-44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 
rpm at 27 ""C in the dark. 



2 5 Chen et al. Plant MoL Bio. 36:163 (1998). Briefly, 4-days post-subculture cells are incubated 
with cell wall digestion solution containing 0.4 M sorbitol, 2% driselase, 5mM MES (2-[N- 
Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are pelleted gently 
at 60 xg for 5 min. and washed twice in W5 solution containing 154 mM NaCl, 5 mM KCl, 
125 mM CaCl2 and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution 



Transformation of plant cells can be accomplished by a number of methods, as 



The suspension culture cells are transformed with exogenous DNA as described by Z. 
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containing 5 mM MES, 20 mM CaCb, 0.5 M mannitol, pH 5.7 and the protoplast density is 
adjusted to about 4 x 10^ protoplasts per ml. 

15-60 jag of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting 
suspension is mixed with 40% polyethylene glycol (MW 8000, PEG 8000), by gentle 
5 inversion a few times at room temperature for 5 to 25 min. Protoplast culture medium known 
in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated in the 
culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient 
expression of the introduced gene. Alternatively, transformed cells can be used to produce 
transgenic callus, which in tum can be used to produce transgenic plants, by methods known 
10 in the art. See, for example, Nomura and Komamine, Pit. Phys, 7S.:988'991 (1985), 
Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot 
Suspension Cultures. 

An additional deposit, PTA-1411, of an E. coli Library, E. co/iLibA021800, was 
made at the American Type Culture Collection in Manassas, Virginia, USA on February 22, 
1 5 2000 to meet the requirements of Budapest Treaty for the intemational recognition of the 
deposit of microorganisms. This deposit was assigned ATCC accession no. PTA-1411. 

The invention being thus described, it will be apparent to one of ordinary skill in the 
art that various modifications of the materials and methods for practicing the invention can be 
made. Such modifications are to be considered within the scope of the invention as defined 
20 by the following claims. 

Each of the references from the patent and periodical literature cited herein is hereby 
expressly incorporated in its entirety by such citation. 
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>6006847 / 

len = 101410 

5 

Intr 101 

Intr 302 

Intr 465 

Intr 615 

10 Intr 827 

Intr 1129 

Intr 1556 

Intr 1821 

Intr 2009 

15 Intr 2262 

Intr 2674 

Intr 3297 

Intr 3687 

Term 4121 

20 

Init 4221 

Intr 4374 

Intr 4588 

Intr 4853 

25 Intr 5021 

Intr 5257 

Term 5537 

Init 6652 

30 Intr 7207 

Intr 7433 

Term 7743 

Term 8370 

35 Intr 8544 

Intr 8706 

Intr 9041 

Intr 9232 

Intr 9410 

40 Intr 9714 

Intr 9912 

Init 10118 

Single 12038 

45 

Init 12760 

Intr 12994 

Intr 13183 

Intr 13397 

50 Intr 13609 

Intr 13804 

Intr 13955 

Term 14493 

55 Init 15035 

Intr 15533 

Intr 15703 

Intr 15943 

Intr 16129 

60 Intr 16303 



nex = 2 00 



172 + 1 

396 + 1 

531 ■ + 1 ■ 

744 + 1 

1045 + 1 

1259 + 1 

1730 + 1 

1911 + 1 

2166 + 1 

2450 + 1 

2733 + 1 

3374 + 1 

3792 -f 1 

4131 + 1 

4285 + 2 

4502 + 2 

4762 -f 2 

4943 + 2 

5178 + 2 

5445 + 2 

5593 + 2 

6829 + 3 

7355 + 3 

7548 + 3 

7908 + 3 

8185 ~ 4 

8458 - 4 

8631 - 4 

8791 - 4 

9116 - 4 

9336 ~ 4 

9493 - 4 

9799 - 4 

10011 - 4 

12232 + 5 

12883 + 6 

13143 + 6 

13269 + 6 

13542 + 6 

13693 + 6 

13850 4- 6 

14036 + 6 

14779 -H 6 

15196 + 7 

15595 + 7 

15789 + 7 

16041 + 7 

16224 + 7 

16368 + 7 
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Intr 16477 

Intr 16788 

Intr 16991 

Intr 17126 

5 Term 17313 

Term 18055 

Intr 18253 

Intr 18652 

10 Init 18908 

Term 19967 

Init 21511 

15 Term 23137 

Intr 23400 

Intr 24007 

Init 24566 

20 Term 27663 

Intr 28142 

Intr 28355 

Intr 29372 

Init 29979 

25 

Init 32376 

Term 33427 

Single 33834 

30 

Single 34444 

Single 35773 

35 Single 36634 

Init 39808 

Intr 40685 

Intr 41096 

40 Intr 41376 

Intr 41781 

Intr 41937 

Intr 42154 

Intr 42608 

45 Intr 43007 

Intr 43243 

Intr 43644 

Intr 44412 

Intr 44892 

5 0 Term 45011 

Term 45744 

Intr 45893 

Intr 46050 

55 Intr 46198 

Intr 46465 

Intr 46611 

Intr 46738 

Intr 46915 

60 Intr 47121 

Intr 47313 
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16707 + 7 

16853 + 7 

17041 + 7 

17190 + 7 

17364 + 7 

18002 - 8 

18176 - 8 

18360 - 8 

18752 - 8 

19961 - 9 

20829 - 9 

22716 " 10 

23276 - 10 

23826 - 10 

24393 - 10 

27657 - 11 

27844 - 11 

28243 - 11 

28495 - 11 

29504 - 11 

32405 + 12 

33504 + 12 

33553 - 13 

34839 + 14 

35486 - 15 

36912 + 16 

40074 + 17 

40767 + 17 

41294 + 17 

41460 + 17 

41851 + 17 

42025 + 17 

42397 + 17 

42720 + 17 

43097 + 17 

43371 + 17 

44197 4- 17 

44793 4- 17 

44928 + 17 

45162 + 17 

45571 - 18 

45828 - 18 

45970 - 18 

46125 - 18 

46405 - 18 

46541 - 18 

46696 - 18 

46827 - 18 

47016 - 18 

47208 - 18 
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Intr 47748 

Intr 47911 

Intr 48181 

Intr 49398 

5 Intr 50057 

Intr 50822 

Intr 51452 

Intr 51762 

Intr 52202 

10 Intr 52533 

Intr 53068 

Intr 53355 

Init 53836 

15 Term 56659 

Intr 57188 

Intr 57376 

Intr 57636 

Intr 57824 

20 Intr 58074 

Intr 58571 

Intr 58795 

Intr 58979 

Init 59429 

25 

Init 61808 

Intr 61923 

Term 62406 

30 Term 63682 

Intr 63845 

Intr 63968 

Intr 64321 

Intr 64643 

35 Init 65146 

Init 68595 

Intr 70025 

Intr 70252 

40 Intr 70589 

Intr 70868 

Intr 71080 

Intr 71271 

Intr 71706 

45 Term 71836 

Term 72768 

Init 73399 

50 Term 75920 

Init 77055 

Init 77311 

Intr 78802 

55 Intr 79247 

Term 79612 

Single 80711 

60 Init 82567 

Intr 82918 
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47701 - 18 

47842 - 18 

48145 - 18 

49195 - 18 

49515 " 18 

50166 - 18 

50955 - 18 

51550 - 18 

51913 ~ 18 

52394 - 18 

52996 - 18 

53178 - 18 

53564 - 18 

56291 - 19 

56736 - 19 

57271 - 19 

57470 - 19 

57744 - 19 

57911 - 19 

58180 - 19 

58667 - 19 

58893 ~ 19 

59095 - 19 

61831 + 20 

61998 + 20 

62887 -f 20 

63453 ~ 21 

63800 " 21 

63940 - 21 

64165 - 21 

64463 " 21 

65142 - 21 

69930 + 22 

70158 + 22 

70492 + 22 

70794 + 22 

70911 + 22 

71191 + 22 

71583 + 22 

71724 + 22 

71860 + 22 

72358 ~ 23 

72989 - 23 

75520 - 24 

76392 - 24 

77678 + 25 

79147 + 25 

79529 + 25 

79925 + 25 

81892 + 26 

82807 + 27 

83174 + 27 
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10 



15 



20 



25 



30 



35 



40 



45 



50 
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+ 


27 
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O O / X _? 
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z / 




O ^ X / iil 


P 4 9 4 Q 
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Z / 
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X 


z 0 
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P 4 R P 7 


+ 


Z C) 
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z y 
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1 
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97489 


97196 
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Term 
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>6006873 
len 
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42 



65 
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60 



Init 550 631 
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